CSCI E-107 Final Project

Overview and Motivation

asdf from proposal: The MBTA is the greater Boston area’s traditional public transit system with over 1 million riders per typical weekday. It is comprised of subway trains, longer reach commuter trains, buses and ferries. Since 2009, the MBTA has made available a large amount of data regarding trips on the system, including alerts. The real time alerts are scraped and then Tweeted by CodeForBoston.

Hubway is a growing bike sharing system implemented in Boston, Brookline, Cambridge, and Somerville. Users of the Hubway system range from young to old and live in various parts of the Greater Boston area. Users may rent bikes casually, or register for commutes.

To our knowledge, there has been no previous analysis on the intersection of ridership of the MBTA and Hubway systems, both of which are deeply integrated into Boston’s transit system. Our project aims to related the trends in usage in a combined analysis of both transport systems.

Initial Questions

asdf from proposal:

Data

These are the sources we were looking at when putting together our proposal. The MBTA’s GTFS archive was a little intimidating because the data is scatter across several CSVs, but it looked like a rich dataset. The Hubway data looked easy to work with, but it was only available for a limited timeframe. Hubway also has a GTFS-like feed, which we could use to generate a current dataset, but would only be useful for future projects as the dataset would only grow in real time – it does not provide historical data. We were unclear at first far back we would be able to get MBTA alert data. Alerts are not part of the GTFS archive, but are sent out on Twitter (originally by a third-party bot). Initial research indicated we would probably only be able to get the last 3200 tweets, but we did not know how far back in time that would go. It was not looking likely that we would have overlapping MBTA GTFS, MBTA alert, and Hubway data. We were not deterred, if alerts and slow-downs were well correlated, then we could use MBTA’s GTFS data in place of alerts when looking at the older Hubway data.

MBTA Archive (2009-)

Hubway Data (2011-2013)

MBTA Alerts Twitter Feed (last 3200 tweets)

NOAA (last 5 years) * http://w2.weather.gov/climate/index.php?wfo=box

We discovered a major pitfall after the approval of our proposal. The archive data the MBTA provides is only schedule data – the archives do not include any actual arrival, departure, travel time, or headway data. Because of the volume of the data and the way it is arranged amongst the various CSVs, we did not recognize this at first.

We reached out to the MBTA and our timing was good – they had just recently announced a preview of a new performance API for pulling down historical data, including actual arrivals, departures and headways. However, the data here only goes back to mid-2015. By this time, we also had our Twitter API keys and were able to pull down the alert data, which also only went back to 2015. With no real MBTA data prior to 2015 and no recent Hubway data, we would not be able to do any comparisons between the two transit systems. We were excited to work with MBTA’s new API.

MBTA Performance Data (July 2015-)

Revised Questions

asdf

Code

Load Libraries:

library(dplyr)
library(jsonlite)
library(lubridate)
library(readr)
library(tidyr)
library(leaflet)
library(ggplot2)
library(twitteR) #masks id, location from dplyr
library(stringr)
library(ggmap)
library(maps)
library(mapproj)
library(rworldmap)
library(dygraphs)
library(grid)
library(gridExtra)

To access the MBTA Developer’s APIs, some things we hard code:

# API information for the real-time feed
TKeyJeff <- "?api_key=tNprgYF1K027zEIVluAkCw"
TRouteURL <- "http://realtime.mbta.com/developer/api/v2/stopsbyroute"
TTravelURL <- "http://realtime.mbta.com/developer/api/v2.1/traveltimes"
THeadwaysURL <- "http://realtime.mbta.com/developer/api/v2.1/headways"
TFormat <- "&format=json"

# The start date of the class -- does affect how many archives are needed
startTime <- as.POSIXct("2016-01-25 04:00:00") 

# Load archives; first is current archive, the rest are father back in time
TArchiveURLs <- c("http://www.mbta.com/uploadedfiles/MBTA_GTFS.zip",
                  "http://www.mbta.com/gtfs_archive/20151211.zip") 

# Keys we generated from Twitter's website
consumer_key <- 'bpCAcAY27kfpSyOAOFXNP2PsO'
consumer_secret <- '5skjmU5FgWUA77PI4OwuBLcmv3Rr03xEKZQoG0FJJbI0wt3oMa'
access_token <- '111824999-KVpkYnMt3MZU2Bfxl9lcHZfMvdF5pYZiHQqSonE6'
access_secret <- 'Ib5N3qKxZ7CT1TuQeznHv6XobdCmjZkSVTESkVj7TwVZm'

# Weather data; we were prepared to use this, but did not end up doing so
weather<-read.csv("https://www.ncei.noaa.gov/orders/cdo/729412.csv")

In order to associate human-readable names, geographical data, and such with MBTA stops, we pull down sets of data regarding the routes.

RedLineRoute <- fromJSON(paste(TRouteURL, TKeyJeff, TFormat, "&route=Red", sep=""))[[1]]
MattapanLineRoute <- fromJSON(paste(TRouteURL, TKeyJeff, TFormat, "&route=Mattapan", sep=""))[[1]]
GreenBLineRoute <- fromJSON(paste(TRouteURL, TKeyJeff, TFormat, "&route=Green-B", sep=""))[[1]]
GreenCLineRoute <- fromJSON(paste(TRouteURL, TKeyJeff, TFormat, "&route=Green-C", sep=""))[[1]]
GreenDLineRoute <- fromJSON(paste(TRouteURL, TKeyJeff, TFormat, "&route=Green-D", sep=""))[[1]]
GreenELineRoute <- fromJSON(paste(TRouteURL, TKeyJeff, TFormat, "&route=Green-E", sep=""))[[1]]
BlueLineRoute <- fromJSON(paste(TRouteURL, TKeyJeff, TFormat, "&route=Blue", sep=""))[[1]]
OrangeLineRoute <- fromJSON(paste(TRouteURL, TKeyJeff, TFormat, "&route=Orange", sep=""))[[1]]

The Red line is somewhat special in how it appears in the data; it is “one” line, but it has a fork at JFK/UMass which (if you’re southbound) goes to Ashmont and Braintree. Since the Red line is the one we want to experiment on first, in order to get a full list of arrival/departure pairs with which to query the API we do some manual configuration.

# Start with Southbound...
south_main <- data.frame(
  stop_id = c("70061","70063","70065","70067","70069","70071","70073","70075","70077","70079","70081"),
  next_stop = c("70063","70065","70067","70069","70071","70073","70075","70077","70079","70081","70083"))

south_a <- data.frame(
  stop_id = c("70083","70085","70087","70089","70091"),
  next_stop = c("70085","70087","70089","70091","70093"))

south_b <- data.frame(
  stop_id = c("70083","70095","70097","70099","70101","70103"),
  next_stop = c("70095","70097","70099","70101","70103","70105"))

# Repeat for Northbound...
north_main <- data.frame(
  stop_id = c("70084","70082","70080","70078","70076","70074","70072","70070","70068","70066","70064"),
  next_stop = c("70082","70080","70078","70076","70074","70072","70070","70068","70066","70064","70061"))
north_a <- data.frame(
  stop_id = c("70094","70092","70090","70088","70086"),
  next_stop = c("70092","70090","70088","70086","70084"))
north_b <- data.frame(
  stop_id = c("70105","70104","70102","70100","70098","70096"),
  next_stop = c("70104","70102","70100","70098","70096","70084"))

# Then rbind everything together.
distinct_stop_pairs <- rbind(south_main, south_a, south_b, north_main, north_a, north_b)

This gives us a complete set of paired stops. We don’t remove the Ashmont/Braintree/Main lists because we may want them later.

Next we use the MBTA’s new API to fetch data for travel times. This data consists of one record (row) for each time a train went from one stop to another. In order the query it, we need to supply a pair of stops and a range of dates, and the result is all travel times between those stops within that window. The windows are throttled to one week, so we need to loop through both the set of stop-pairs and the set of weeks. Because the data is large and takes about half and hour to extract, first we check to be sure that we don’t already have the data locally. If you wanted to force a manual refresh, it’s easy enough to just run the appropriate segment of code (the final “else” statement).

# The following can be skipped if "train_travel_times.csv" is present in the working directory
if(length(ls(pattern="travel_times")) > 0) {
    # do nothing - we already have the data in the environment
    print("We already have a travel_times object. Did you mean to do that?")
} else if (file.exists("train_travel_times.csv.gz")) {
    # can skip MBTA queries and load this instead
    print("Loading previously generated data.")
    travel_times <- read.csv(gzfile("train_travel_times.csv.gz"), as.is = TRUE)
} else {
    print("Requesting data from realtime.mbta.com...")
    # Create a holding frame for the data; we do this outside the loops so that it will persist.
    travel_times <- data.frame(direction=as.numeric(character()),
                        dep_dt=as.POSIXct(character()), 
                        arr_dt=as.POSIXct(character()), 
                        travel_time_sec=as.numeric(character()),
                        benchmark_travel_time_sec=as.numeric(character()),
                        from_stop=character(), 
                        to_stop=character()) 
    
    # How many seven day periods from start to now?
    numWeeks <- as.integer(unclass(now() - startTime)/7)
    
    # The outer loop cycles through every distinct pair of stops.
    for(j in 1:nrow(distinct_stop_pairs)) {
      from_j <- distinct_stop_pairs[j,]$stop_id
      to_j <- distinct_stop_pairs[j,]$next_stop
      fromStop <- paste("&from_stop=", from_j, sep="")
      toStop <- paste("&to_stop=", to_j, sep="")
      print(paste("Requesting", from_j, "to", to_j))
      
    # The inner loop cycles through each week of interest.  
      for(i in 0:numWeeks) {
        fromTime <- paste("&from_datetime=", as.numeric(startTime + days(i * 7)), sep="")
        toTime <- paste("&to_datetime=", as.numeric(startTime + days(i * 7) + days(7) - minutes(1)), sep="")
        TRequest <- paste(TTravelURL, TKeyJeff, TFormat, fromStop, toStop, fromTime, toTime, sep="")
        foo <- fromJSON(TRequest)[[1]]
        
    # Assuming we get a result back, we process it within the
    # inner loop, reformatting columns and dropping any we don't
    # plan to use. We then append it to travel_times.
        if (length(foo) > 0) {
          bar <- foo %>%
            mutate(from_stop = from_j,
              to_stop = to_j,
              dep_dt = as.POSIXct(as.integer(dep_dt), origin="1970-01-01"),
              arr_dt = as.POSIXct(as.integer(arr_dt), origin="1970-01-01"),
              travel_time_sec = as.numeric(travel_time_sec),
              benchmark_travel_time_sec = as.numeric(benchmark_travel_time_sec)) %>%
            select(-route_id, -contains("threshold"))
          travel_times <- rbind(travel_times, bar)
        } else {
          print(paste("Nothing returned for", fromStop, "to", toStop, "during period", fromTime, "-", toTime))
        }
        Sys.sleep(2) #slow down a bit
      }
    }
    
    # splitting date & time
    travel_times <- mutate(travel_times, dep_d=as.Date(dep_dt), 
                        dep_t=format(as.POSIXct(dep_dt), format="%H:%M:%S"), 
                        arr_d=as.Date(arr_dt), 
                        arr_t=format(as.POSIXct(arr_dt), format="%H:%M:%S"))
    # adding parent_station_name, lat and lon
    # Note to future self: clean up this code!
    travel_times <- bind_rows(RedLineRoute$stop[1][[1]], RedLineRoute$stop[2][[1]]) %>% 
                      select(stop_id, parent_station_name, stop_lat, stop_lon) %>% 
                      mutate(stop_id=as.integer(stop_id)) %>% 
                      rename(to_stop = stop_id, to_name = parent_station_name, to_lat = stop_lat, to_lon = stop_lon) %>% 
                      inner_join(travel_times, by="to_stop")
    travel_times <- bind_rows(RedLineRoute$stop[1][[1]], RedLineRoute$stop[2][[1]]) %>% 
                      select(stop_id, parent_station_name, stop_lat, stop_lon) %>% 
                      mutate(stop_id=as.integer(stop_id)) %>% 
                      rename(from_stop = stop_id, from_name = parent_station_name, from_lat = stop_lat, from_lon = stop_lon) %>% 
                      inner_join(travel_times, by="from_stop")
    travel_times <- arrange(travel_times, direction, dep_dt)
    
    z <- gzfile("train_travel_times.csv.gz")
    # So others don't need to pull the data again, we save off the data frame.
    write.csv(travel_times, z) 
    # Load it back in, just to be sure it always looks the same.
    travel_times <- read.csv(gzfile("train_travel_times.csv.gz"), as.is = TRUE)

}
## [1] "Loading previously generated data."

We also want to pull what the MBTA calls “headways”; these track the time between departures at a given station. When trains are running late, headways exceed their benchmark targets. The root causes of slow service can probably be better picked apart from travel times and dwell times, but the eventual impact to riders is most cleanly seen in headways. In structure and access, this data is very similar to the travel times data, with each record representing one train leaving a station.

# The following can be skipped if "train_headway_times.csv" is present in the working directory

if (length(ls(pattern="headway_times")) > 0) {
  # do nothing - we already have the data in the environment
  print("We already have a headways object. Did you mean to do that?")
} else if (file.exists("train_headway_times.csv.gz")) {
  # can skip MBTA queries and load this instead
  print("Loading previously generated data.")
  headway_times <- read.csv(gzfile("train_headway_times.csv.gz"), as.is = TRUE)
} else {
  print("Requesting data from realtime.mbta.com...")
  # create a holding frame for the data; we do this outside the loops so that it will persist.
  headway_times <- data.frame(direction=as.numeric(character()),
                              current_dep_dt=as.POSIXct(character()), 
                              previous_dep_dt=as.POSIXct(character()), 
                              headway_time_sec=as.numeric(character()),
                              benchmark_headway_time_sec=as.numeric(character()),
                              from_stop=character(), 
                              to_stop=character()) 
  
  # How many seven day periods from start to now?
  numWeeks <- as.integer(unclass(now() - startTime)/7)
  
  # The outer loop cycles through every distinct pair of stops.
  for(j in 1:nrow(distinct_stop_pairs)) {
    from_j <- distinct_stop_pairs[j,]$stop_id
    to_j <- distinct_stop_pairs[j,]$next_stop
    fromStop <- paste("&stop=", from_j, sep="")
    toStop <- paste("&to_stop=", to_j, sep="")
    print(paste("Requesting", from_j, "to", to_j))
    
    # The inner loop cycles through each week of interest.  
    for(i in 0:numWeeks) {
      fromTime <- paste("&from_datetime=", as.numeric(startTime + days(i * 7)), sep="")
      toTime <- paste("&to_datetime=", as.numeric(startTime + days(i * 7) + days(7) - minutes(1)), sep="")
      TRequest <- paste(THeadwaysURL, TKeyJeff, TFormat, fromStop, toStop, fromTime, toTime, sep="")
      foo <- fromJSON(TRequest)[[1]]
      
      # Assuming we get a result back, we process it within the
      # inner loop, reformatting columns and dropping any we don't
      # plan to use. We then append it to travel_times.
      if (length(foo) > 0) {
        bar <- foo %>%
          mutate(from_stop = from_j,
                 to_stop = to_j,
                 current_dep_dt = as.POSIXct(as.integer(current_dep_dt), origin="1970-01-01"),
                 previous_dep_dt = as.POSIXct(as.integer(previous_dep_dt), origin="1970-01-01"),
                 headway_time_sec = as.numeric(headway_time_sec),
                 benchmark_headway_time_sec = as.numeric(benchmark_headway_time_sec)) %>%
          select(-route_id, -contains("threshold"))
        headway_times <- rbind(headway_times, bar)
      } else {
        print(paste("Nothing returned for", fromStop, "to", toStop, "during period", fromTime, "-", toTime))
      }
      Sys.sleep(2) #slow down a bit
    }
  }
  
  # splitting date & time
  headway_times <- mutate(headway_times, current_dep_d=as.Date(current_dep_dt), 
                          current_dep_t=format(as.POSIXct(current_dep_dt), format="%H:%M:%S"), 
                          previous_dep_d=as.Date(previous_dep_dt), 
                          previous_dep_t=format(as.POSIXct(previous_dep_dt), format="%H:%M:%S"))
  # adding parent_station_name, lat and lon
  # hey future self, here's another area for simplification.
  headway_times <- bind_rows(RedLineRoute$stop[1][[1]], RedLineRoute$stop[2][[1]]) %>% 
    select(stop_id, parent_station_name, stop_lat, stop_lon) %>% 
    mutate(stop_id=as.integer(stop_id)) %>% 
    rename(to_stop = stop_id, to_name = parent_station_name, to_lat = stop_lat, to_lon = stop_lon) %>% 
    inner_join(headway_times, by="to_stop")
  headway_times <- bind_rows(RedLineRoute$stop[1][[1]], RedLineRoute$stop[2][[1]]) %>% 
    select(stop_id, parent_station_name, stop_lat, stop_lon) %>% 
    mutate(stop_id=as.integer(stop_id)) %>% 
    rename(from_stop = stop_id, from_name = parent_station_name, from_lat = stop_lat, from_lon = stop_lon) %>% 
    inner_join(headway_times, by="from_stop")
  headway_times <- arrange(headway_times, direction, current_dep_dt)
  
  z <- gzfile("train_headway_times.csv.gz")
  write.csv(headway_times, z) #so others don't need to pull the data again
  # Load it back in, just to be sure it always looks the same.
  headway_times <- read.csv(gzfile("train_headway_times.csv.gz"), as.is = TRUE)

}
## [1] "Loading previously generated data."

Now we have two large, related data sets. Before we can work with the travel time or headways data, we need to clean up some data types. Almost all of the data is initially stored as text.

# We should probably move this above the saving of the CSV so we don't need to do it every run
travel_times$X <- NULL
travel_times$direction <- as.character(travel_times$direction)
travel_times$from_stop <- as.character(travel_times$from_stop)
travel_times$to_stop <- as.character(travel_times$to_stop)
travel_times$dep_dt <- as.POSIXct(travel_times$dep_dt, tz="EST")
travel_times$arr_dt <- as.POSIXct(travel_times$arr_dt, tz="EST")
travel_times$dep_d <- as.POSIXct(travel_times$dep_d, tz="EST")
travel_times$arr_d <- as.POSIXct(travel_times$arr_d, tz="EST")

headway_times$X <- NULL
headway_times$direction <- as.character(headway_times$direction)
headway_times$from_stop <- as.character(headway_times$from_stop)
headway_times$to_stop <- as.character(headway_times$to_stop)
headway_times$current_dep_dt <- as.POSIXct(headway_times$current_dep_dt, tz="EST")
headway_times$previous_dep_dt <- as.POSIXct(headway_times$previous_dep_dt, tz="EST")
headway_times$current_dep_d <- as.POSIXct(headway_times$current_dep_d, tz="EST")
headway_times$previous_dep_d <- as.POSIXct(headway_times$previous_dep_d, tz="EST")

# Add a column for how far off of benchmark each value is. In theory slowdowns are transferred
# through the system linearly, so one minute of delay is one minute of delay.
travel_times$time_delta <- travel_times$travel_time_sec - travel_times$benchmark_travel_time_sec
headway_times$time_delta <- headway_times$headway_time_sec - headway_times$benchmark_headway_time_sec

# While we're at it, we'll also calculate "lateness" for headways; how many seconds (if any)
# the departure exceeded it's benchmark headway time.
headway_times$lateness <- headway_times$headway_time_sec - headway_times$benchmark_headway_time_sec
headway_times[headway_times$lateness < 0,]$lateness <- 0

# Add a column for the "effective service date", so that late-night trains can be easily
# counted as part of the previous day.
travel_times$dt <- as.POSIXct(trunc(travel_times$dep_dt - seconds(14400), units = "days"))
headway_times$dt <- as.POSIXct(trunc(headway_times$current_dep_dt - seconds(14400), units = "days"))

# Adding a convenient reference column for whether the day is a weekend or not, since
# train schedules are very different. This also makes it very convenient to replace the
# contents of this column if, in future work, one wanted to replace this with an actual
# schedule reference (weekday holidays sometimes run weekend schedules)
travel_times$is_weekend <- as.POSIXlt(travel_times$dt)$wday %in% c(0,6)
headway_times$is_weekend <- as.POSIXlt(headway_times$dt)$wday %in% c(0,6)

When we first started on this project, we believed that the MBTA archives contained the travel-time and headway information that we wanted. It turns out we were wrong, but this archive data is still useful for finding stop information. This is the code to pull it.

#This chunk is slow.  We should probably pull out the few things that are being used into a new chunk,
#then turn off evaluation of the rest.
for(i in 1:length(TArchiveURLs)) {
    if(file.exists(basename(TArchiveURLs[i]))) {
        print(paste("Using existing", basename(TArchiveURLs[i])))
    } else {
        download.file(TArchiveURLs[i], destfile=basename(TArchiveURLs[i]))
    }
    unzip(basename(TArchiveURLs[i]), exdir=strsplit(basename(TArchiveURLs[i]), "\\.")[[1]][1], overwrite=FALSE)
}
## [1] "Using existing MBTA_GTFS.zip"
## [1] "Using existing 20151211.zip"
# The following is fixed for two archives; should be loop based on length of TArchivesURLs
setwd(strsplit(basename(TArchiveURLs[1]), "\\.")[[1]][1])

# Get schedule data:
Tarchive_cal <- read_csv("calendar.txt") %>% mutate(zip_file=basename(getwd()))
Tarchive_calDates <- read_csv("calendar_dates.txt")
Tarchive_trips <- read.csv("trips.txt")
Tarchive_stopTimes <- read.csv("stop_times.txt")
Tarchive_stops <- read.csv("stops.txt")

# OK, as a first step, let's trim back the Tarchive tables to just the subway (RTL) data:
Tarchive_cal <- filter(Tarchive_cal, grepl("RTL", service_id))
Tarchive_calDates <- filter(Tarchive_calDates, grepl("RTL", service_id)) # exception_type; 1=added; 2=removed
Tarchive_trips <- filter(Tarchive_trips, grepl("RTL", service_id))
Tarchive_stopTimes <- filter(Tarchive_stopTimes, trip_id %in% Tarchive_trips$trip_id)

setwd("..")

# Loop, please:
setwd(strsplit(basename(TArchiveURLs[2]), "\\.")[[1]][1])

Tarchive_cal2 <- read_csv("calendar.txt") %>% mutate(zip_file=basename(getwd()))
Tarchive_calDates2 <- read_csv("calendar_dates.txt")
Tarchive_trips2 <- read.csv("trips.txt")
Tarchive_stopTimes2 <- read.csv("stop_times.txt")

# trim2 -- dealing with multiple archive files really cries out for a function here, doesn't it
Tarchive_cal2 <- filter(Tarchive_cal2, grepl("RTL", service_id))
Tarchive_calDates2 <- filter(Tarchive_calDates2, grepl("RTL", service_id)) # exception_type; 1=added; 2=removed
Tarchive_trips2 <- filter(Tarchive_trips2, grepl("RTL", service_id))
Tarchive_stopTimes2 <- filter(Tarchive_stopTimes2, trip_id %in% Tarchive_trips$trip_id)

# combine our archives
Tarchive_cal <- bind_rows(Tarchive_cal, Tarchive_cal2)
Tarchive_calDates <- bind_rows(Tarchive_calDates, Tarchive_calDates2)
Tarchive_trips <- bind_rows(Tarchive_trips, Tarchive_trips2)
Tarchive_stopTimes <- bind_rows(Tarchive_stopTimes, Tarchive_stopTimes2)

setwd("..")

rm(Tarchive_cal2)
rm(Tarchive_calDates2)
rm(Tarchive_trips2)
rm(Tarchive_stopTimes2)

# Convert the dates:
Tarchive_cal <- mutate(Tarchive_cal, start_date=as.Date(as.character(start_date), format="%Y%m%d", origin="1970-01-01"), 
                                     end_date=as.Date(as.character(end_date), format="%Y%m%d", origin="1970-01-01"))
Tarchive_calDates <- mutate(Tarchive_calDates, date=as.Date(as.character(date), format="%Y%m%d", origin="1970-01-01"))

# Now, loop through each day between the start date and today,
# for each day, get the relevant service_ids and,
# find the corresponding trip_ids, then
# pull all the corresponding train arrival and departure times
schedule_dataset <- data.frame()
for(i in 0:(as.integer(unclass(now() - startTime)))) {
    iDate <- as.Date(startTime+days(i))
    
    #service_ids
    todaysServices <- filter(Tarchive_cal, start_date<=iDate & end_date>=iDate) %>%
        select(service_id) %>%
        distinct()

    #remove exceptions
    if(iDate %in% Tarchive_calDates$date) {
        servicesRemoved <- unique(filter(Tarchive_calDates, date==iDate & exception_type==2)[[1]])
        if(length(servicesRemoved)>1) {
            todaysServices <- filter(todaysServices, !service_id %in% servicesRemoved)
        }
    }
    
    #remove remaining regularly scheduled services that don't match this day of the week
    if(wday(iDate)==1) {
        todaysServices <- filter(todaysServices, grepl("Sunday", service_id))
    } else if(wday(iDate)==7) {
        todaysServices <- filter(todaysServices, grepl("Saturday", service_id))
    } else {
        todaysServices <- filter(todaysServices, grepl("Weekday", service_id))
    }
        
    #add exceptions
    if(iDate %in% Tarchive_calDates$date) {
        servicesAdded <- filter(Tarchive_calDates, date==iDate & exception_type==1) %>% 
                         select(service_id) %>% 
                         distinct()
        if(nrow(servicesAdded)>1) {
            todaysServices <- bind_rows(todaysServices, servicesAdded)
        }
    }
        
    #trip_ids
    todaysTrips <- filter(Tarchive_trips, service_id %in% todaysServices$service_id) %>% 
                  filter(route_id=="Red") %>% 
                  distinct()
#     print(paste(i, "; Date: ", iDate, "; # Services: ", nrow(todaysServices), "; # Trips: ", nrow(todayTrips), 
#                 "; Exception = ", (iDate %in% Tarchive_calDates$date), sep=""))

    #stop_times
    todaysStops <- filter(Tarchive_stopTimes, trip_id %in% todaysTrips$trip_id) %>% 
                   mutate(arrival_date=iDate, departure_date=iDate)
    
    schedule_dataset <- bind_rows(schedule_dataset, todaysStops)
}

As a final table-modification step, create a simple ordered list of stops with relevant properties, both as a lookup and so stops can appear in order in visualizations. It will still produce odd behavior around the Red Line’s fork, but more advanced visualizations will take this into account. We join this table to our headway and travel time data.

stop_sequence <- rbind(
# Southbound
  RedLineRoute$stop[[1]] %>%
  arrange(as.integer(stop_order)) %>%
  mutate(stop_seq = row_number(), heading = "Southbound") %>%
  select (stop_id, stop_name, parent_station_name, heading, stop_seq),
# Northbound
  RedLineRoute$stop[[2]] %>%
  arrange(as.integer(stop_order)) %>%
  mutate(stop_seq = row_number(), heading = "Northbound") %>%
  select (stop_id, stop_name, parent_station_name, heading, stop_seq))

headway_times <- headway_times %>%
  mutate(dep_time = current_dep_dt - dt) %>%
  left_join(.,stop_sequence, by=c("from_stop" = "stop_id"))

travel_times <- travel_times %>%
  mutate(dep_time = dep_dt - dt) %>%
  left_join(.,stop_sequence, by=c("from_stop" = "stop_id"))

A simple scatter plot of train departures from Porter Square heading inbound over time, to demonstrate what the level of detail in this data.

plot(as.numeric(travel_times[travel_times$from_stop == "70065",]$dep_dt - travel_times[travel_times$from_stop == "70065",]$dt,unit = 'mins'),
     travel_times[travel_times$from_stop == "70065",]$dt,
     main="train departures from Porter Square heading inbound",
     xlab="Minutes since midnight", ylab = "Date of Trip", pch=".")

Here’s one using headways, color-coding lateness (more on this later!); blue is “cool”, with low lateness:

plot(as.numeric(headway_times[headway_times$from_stop == "70065",]$current_dep_dt - headway_times[headway_times$from_stop == "70065",]$dt,unit = 'mins'),
     headway_times[headway_times$from_stop == "70065",]$dt,
     main="Departures from Porter Square heading inbound\n color-coded by headway lateness",
     xlab="Minutes since midnight", ylab = "Date of Trip", pch=".", col = cm.colors(headway_times[headway_times$from_stop == "70065",]$lateness))

Here we look at the times between trains by time of day. We restrict the data to only southbound weekday trains, so as to avoid mixing unrelated data in the same plot.

ggplot(headway_times %>% filter(is_weekend == FALSE, direction == "0"), aes(x = as.numeric(dep_time, units="hours"), y = time_delta)) +
  geom_point(alpha=0.05) +
  xlab("Hours (24+ is past midnight)") +
  ylab("Seconds from Benchmark") +
  ggtitle("Times between trains by time of day, weekday Southbound")

Another way to visualize these time-deltas is a boxplot.

boxplot(time_delta~parent_station_name,
     data = headway_times %>% filter(is_weekend == FALSE, direction == "0"),
     main="Times between trains by station, weekday Southbound",
     xlab="Stop Sequence", ylab = "wait time (seconds)", pch=".", ylim=c(-800, 1600))

boxplot(time_delta~stop_seq,
        data = headway_times %>% filter(is_weekend == TRUE, direction == "1"),
        main="Times between trains by station, Weekend Northbound",
        xlab="Stop Sequence", ylab = "wait time (seconds)", pch=".", ylim=c(-800, 1600))

The boxplots makes clear that though much of the data is very close to the benchmark, there are certainly instances where gaps are greater or less than their benchmark times. In fact, in order to clearly see the boxes we had to artificially restrict the range of the y axis.

To get a better sense of the full ranges of the data, lets try a density plot. Here we’re looking at time-deltas for headways by station; there is naturally significant overplotting, but this helps us spot very unusual stations.

headway_plots <- list()
headway_plots[[1]] <- ggplot(headway_times %>% filter(direction == "0", is_weekend == FALSE), aes(x=time_delta)) +
  geom_density(aes(group = parent_station_name, colour= parent_station_name, fill= parent_station_name), alpha=0.2) +
  ggtitle("Weekday, Southbound") +
  xlim(-1200,2000) +
  theme(legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title=element_blank(),
        legend.key.size=unit(0.3, "cm"),
        legend.text=element_text(size=6),
        axis.ticks.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.y=element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(size = 10))

headway_plots[[2]] <- ggplot(headway_times %>% filter(direction == "1", is_weekend == FALSE), aes(x=time_delta)) +
  geom_density(aes(group = parent_station_name, colour= parent_station_name, fill= parent_station_name), alpha=0.2) +
  ggtitle("Weekday, Northbound") + 
  xlim(-1200,2000) +
  theme(legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title=element_blank(),
        legend.key.size=unit(0.3, "cm"),
        legend.text=element_text(size=6),
        axis.ticks.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.y=element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(size = 10))

headway_plots[[3]] <- ggplot(headway_times %>% filter(direction == "0", is_weekend == TRUE), aes(x=time_delta)) +
  geom_density(aes(group = parent_station_name, colour= parent_station_name, fill= parent_station_name), alpha=0.2) +
  ggtitle("Weekend, Southbound") +
  xlim(-1200,2000) +
  theme(legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title=element_blank(),
        legend.key.size=unit(0.3, "cm"),
        legend.text=element_text(size=6),
        axis.ticks.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.y=element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(size = 10))

headway_plots[[4]] <- ggplot(headway_times %>% filter(direction == "1", is_weekend == TRUE), aes(x=time_delta)) +
  geom_density(aes(group = parent_station_name, colour= parent_station_name, fill= parent_station_name), alpha=0.2) +
  ggtitle("Weekend, Northbound") +
  xlim(-1200,2000) +
  theme(legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title=element_blank(),
        legend.key.size=unit(0.3, "cm"),
        legend.text=element_text(size=6),
        axis.ticks.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.y=element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(size = 10))
  
grid.arrange(headway_plots[[1]], headway_plots[[2]],headway_plots[[3]],headway_plots[[4]], ncol=2, left="Type of Day", top="Heading")
grid.rect(gp=gpar(fill=NA, col="gray"))

We can see that the distributions tend to skew right; this isn’t a surprise, because there is a limit to how early a train can be, but a much greater limit to how late it can be! We also note that even though the “center mass” of the density plots is very close to 0 (no delay), there are a significant number of late and early trains as well. Weekend northbound trains seem to be the most consistent (density closest to 0), but even there some noteworthy delays can be seen in the plot.

We know that some amount of headway delay can be caused by longer-than-expected times to travel between stations. Let’s look at what northbound, weekday trains’ travel times look like; here we plot densities of travel times, also by station.

ggplot((travel_times %>% filter(direction == 1, is_weekend == FALSE)), aes(x=travel_time_sec)) +
  geom_density(aes(group = parent_station_name, colour= parent_station_name, fill= parent_station_name), alpha=0.2) +
  ggtitle("Travel times for Northbound trains of weekdays, by departing station") +
  xlim(0, 200) + ylim(0, .25)
## Warning: Removed 10990 rows containing non-finite values (stat_density).

We certainly get less overplotting! This makes sense as any given connection is of different geographic separation, but what’s more interesting is how variable those times appear to be even along fixed routes.

Lets take a look at service quality. One sensible metric would be how long pasengers are waiting on platforms for trains beyond how long they “should” be waiting per normal service. In our data this can be thought of as the difference between the actual observed times and the benchmark times provided by the MBTA. However, we only look for cases where this number is positive; the MBTA gets no extra credit for early trains! This isn’t because we’re mean spirited, it’s because faster-than-expected headways are indistinguishable from just happening to get to the platform at the right time to a typical rider, and because faster-than-expected headways are often the result of backup cause by slowness earlier in the day.

Here’s a simple density plot of ALL observed lateness (on-time or early headways removed) to give an idea of what is typical.

ggplot(headway_times %>% filter(lateness > 0), aes(x=lateness)) +
  geom_density(aes(colour="All Trains", fill="All Trains"), alpha=0.5) +
  ggtitle("Total distribution of Lateness (seconds)") +
  xlim(0,2000) +
  theme(legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title=element_blank(),
        legend.key.size=unit(0.3, "cm"),
        legend.text=element_text(size=6),
        axis.ticks.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.y=element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(size = 10))

This shape should be familiar; it’s the right-half of the “time-delta” headway distributions above.

Lets see if there are broad patterns by type of day or direction of travel.

ggplot(headway_times %>% filter(lateness > 0) %>% mutate(is_weekend = ifelse(is_weekend, "Weekend", "Weekday")), aes(x=lateness)) +
  geom_density(aes(group=paste(is_weekend, heading), colour=paste(is_weekend, heading), fill=paste(is_weekend, heading)), alpha=0.3) +
  ggtitle("Total distribution of Lateness (seconds)") +
  xlim(0,2000) +
  theme(legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title=element_blank(),
        legend.key.size=unit(0.3, "cm"),
        legend.text=element_text(size=6),
        axis.ticks.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.y=element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(size = 10))

Interesting; it looks like the MBTA’s expected benchmarks are pretty good, in that the they appear to account well for variances by weekend and direction. If they didn’t, the we would expect the distributions to show some systemic bias, but they’re actually pretty similar. The weekday southbound trains tend to have somewhat shorter lateness events, and for weekend northbound they tend to be longer. Note, however, that we can’t say that weekend Northbound tends to be late moreoften, because we through out anything that wasn’t lateness already.

What if we wanted a sense of both where and when this lateness is most accutely felt? One approach would be to bucket the lateness events, add up the total amount of “overage” in each bucket, and divide by the total time in each bucket. Because of the way intervals of time are handled, though, the process is a bit round-about. First, we build the buckets. In this case, we’re using hourly intervals, defined by the number of seconds since midnight.

seconds_of_day <- data.frame(interval_start_hour = seq(0,26,1), hour_start = seq(0, 60*60*26, 60*60),
hour_end = seq(60*60, 60*60*27, 60*60))

We also need to count the number of days (both weekend and weekdays) in our observation window.

count_of_days <- headway_times %>%
# group our observations by day and day-type
  group_by(is_weekend, dt) %>%
  summarize(count = 1) %>%
# change the date type name for visualization purposes, then count the
# number of each.
  mutate(weekend = ifelse(is_weekend, "Weekend", "Weekday")) %>%
  group_by(weekend) %>%
  summarize(days_observed = sum(count))

Now, we find how many “delay seconds” appear in each of our buckets. The really tricky part is accounting for long delays which are partially in one bucket, and partially in another.

total_lateness <- headway_times %>%
# Filter to only events with actual lateness.
  filter(lateness > 0) %>%
# Trim down the data set; it's going to get bigger before it gets smaller.
  select(from_stop, stop_seq, parent_station_name, heading, is_weekend, current_dep_dt, dt, lateness) %>%
# Add columns determining, from the observed departure date and the lateness, when
# the "expected" departure time was.
  mutate(lateness_start = current_dep_dt - lateness, stop_id = from_stop) %>%
  mutate(dep_t = as.numeric(current_dep_dt - dt, unit = "secs"), late_t = as.numeric(lateness_start - dt, unit = "secs")) %>%
# Cross-join with the intervals; our R packages lack the facility to join
# on intervals directly, so we add them all and then filter later.
  merge(.,seconds_of_day, all=TRUE) %>%
  filter((late_t > hour_start & late_t < hour_end) | (dep_t > hour_start & dep_t < hour_end) | (late_t < hour_start & dep_t > hour_end)) %>%
# Determine the "effective" start and ends of the delays, which are the
# portion of the delay interval which intersects the bucket. Subtract these
# to get the relevant amount of delay.
  mutate(eff_late_start = ifelse(late_t < hour_start, hour_start, late_t), eff_late_end = ifelse(dep_t > hour_end, hour_end, dep_t)) %>%
  mutate(eff_lateness = eff_late_end - eff_late_start, weekend = ifelse(is_weekend, "Weekend", "Weekday")) %>%
# group by our categorical variables to sum up all of the lateness in
# each bucket.
  group_by(interval_start_hour, stop_id, stop_seq, parent_station_name, heading, weekend) %>%
  summarize(total_lateness = sum(eff_lateness)) %>%
# Join to the "count of days" to determine how many total seconds of observations
# there might have been in that bucket; this determines the divisor later.
  left_join(.,count_of_days) %>%
  mutate(seconds_observed = (days_observed*60*60)) %>%
  select(-days_observed)
## Joining by: "weekend"

Hey, that’s cool! We can now compute how late the T was running for any arbitrary combination of hour-of-day, stop, heading, weekend/weekday, just by summing the seconds of observed lateness and deviding by the sum of seconds observed. This gives us a metric, “percent late”, which we can think of as “for the entire time we observed the station, for what % of that time was a train past it’s benchmark arrival time? So, for example:

total_lateness %>%
  group_by(heading, stop_seq, parent_station_name) %>%
  summarize(percent_late = sum(total_lateness)/sum(seconds_observed)) %>%
  arrange(heading, stop_seq)
## Source: local data frame [37 x 4]
## Groups: heading, stop_seq [37]
## 
##       heading stop_seq parent_station_name percent_late
##         (chr)    (int)               (chr)        (dbl)
## 1  Northbound        2        Quincy Adams   0.13982521
## 2  Northbound        3       Quincy Center   0.06030801
## 3  Northbound        4           Wollaston   0.10715109
## 4  Northbound        5        North Quincy   0.11566529
## 5  Northbound        7             Shawmut   0.11318203
## 6  Northbound        8       Fields Corner   0.09378806
## 7  Northbound        9          Savin Hill   0.12145111
## 8  Northbound       10           JFK/Umass   0.14037226
## 9  Northbound       11           JFK/Umass   0.14856819
## 10 Northbound       12              Andrew   0.13644787
## ..        ...      ...                 ...          ...

Given this rate, we can compute a heatmap of delay severity:

weekday_southbound <- total_lateness %>%
  filter(heading == "Southbound", weekend == "Weekday") %>%
  group_by(parent_station_name, interval_start_hour) %>%
  summarize(percent_late = sum(total_lateness)/sum(seconds_observed))

ggplot(weekday_southbound, aes(as.ordered(interval_start_hour), parent_station_name)) +
  geom_tile(aes(fill = percent_late), color = "white") +
  scale_fill_gradient(low = "white",high = "red")

asdf Alerts come in from Twitter via CodeForBoston blah blah

# The following can be skipped if "red_line_tweets.csv" is present in the working directory
if (file.exists("red_line_tweets.csv") & file.exists("green_line_alert.csv") & file.exists("orange_line_alert.csv") & file.exists("blue_line_alert.csv")) {
    print("Loading previously generated data.")
    red_line_alert<- read_csv("red_line_tweets.csv") # can skip Twitter queries and load this instead
    green_line_alert <- read_csv("green_line_alert.csv")
    blue_line_alert <- read_csv("blue_line_alert.csv")
    orange_line_alert <- read_csv("orange_line_alert.csv")
} else {
    
    # authorizing twitter 
    setup_twitter_oauth(consumer_key = consumer_key, 
                        consumer_secret =  consumer_secret, 
                        access_token = access_token, 
                        access_secret = access_secret)
    
    # makes dataframe for results and gets specific features
    getSpecificTweetInformation <- function(x) {
        twListToDF(x) %>% 
            select(screenName, id, text, created, favoriteCount, retweetCount)
    }
    
    # Red Line Alerts user time line only 3200 hard limit
    Red_Line_Alerts_tweets <- userTimeline('Red_Line_Alerts', n=3200, includeRts = FALSE, excludeReplies = TRUE)
    
    # MattaPan Line Alerts user time line only 3200 hard limit
    highspeedalerts_tweets <- userTimeline('highspeedalerts', n=3200, includeRts = FALSE, excludeReplies = TRUE)
    
    # Green Line Alerts user time line only 3200 hard limit
    GreenLineAlerts_tweets <- userTimeline('GreenLineAlerts', n=3200, includeRts = FALSE, excludeReplies = TRUE)
    
    # For other twitter alerts: 
    # # MBTA user time line only 3200 hard limit
    # mbta_tweets <- userTimeline('MBTA', n=3200, includeRts = FALSE, excludeReplies = TRUE)
    # 
    # # MBTA Alerts user time line only 3200 hard limit
    # mbta_alerts_tweets <- userTimeline('mbta_alerts', n=3200, includeRts = FALSE, excludeReplies = TRUE)
    
    # Orange Line Alerts user time line only 3200 hard limit
    OrangeLineAlert_tweets <- userTimeline('OrangeLineAlert', n=3200, includeRts = FALSE, excludeReplies = TRUE)
    
    # Blue Line Alerts user time line only 3200 hard limit
    BlueLineAlerts_tweets <- userTimeline('BlueLineAlerts', n=3200, includeRts = FALSE, excludeReplies = TRUE)
    
    # Combining tweets from different twitter handles
    red_line_alert <- getSpecificTweetInformation(Red_Line_Alerts_tweets) %>% 
        union(getSpecificTweetInformation(highspeedalerts_tweets)) 
    
    # for green line alerts
    green_line_alert <- getSpecificTweetInformation(GreenLineAlerts_tweets)
    
    # for blue line alerts
    blue_line_alert <- getSpecificTweetInformation(BlueLineAlerts_tweets)
    
    # for orange line alerts
    orange_line_alert <- getSpecificTweetInformation(OrangeLineAlert_tweets)
    
    # write all to a csv
    write_csv(red_line_alert, "red_line_alert.csv") 
    write_csv(green_line_alert, "green_line_alert.csv") 
    write_csv(blue_line_alert, "blue_line_alert.csv") 
    write_csv(orange_line_alert, "orange_line_alert.csv") 
}
## [1] "Loading previously generated data."
# tweet count and oldest and most recent tweet dates
red_line_alert %>% 
    group_by(screenName) %>% 
    summarise(num_of_tweets=n(), 
              oldest_date=min(as.Date.POSIXct(created)), 
              most_recent_date=max(as.Date.POSIXct(created)))
## Source: local data frame [2 x 4]
## 
##        screenName num_of_tweets oldest_date most_recent_date
##             (chr)         (int)      (date)           (date)
## 1 highspeedalerts           127  2014-06-11       2016-04-28
## 2 Red_Line_Alerts          1010  2014-06-01       2016-05-01

asdf

# To relate to a data we already have is to add 
# these tweets as "arrival date" ~ "created" and then join them.
red_line_alert <- red_line_alert %>% 
    filter(created > startTime) %>%
    arrange(created) %>% 
    select(text, created, favoriteCount, retweetCount) %>% 
    mutate(arr_dt = created)

travel_times_alerts <- travel_times %>% 
    full_join(red_line_alert)
## Joining by: "arr_dt"
# To add tweets as a parameter for stops is by 
# pulling in stop names from the tweets and then mapping them to stop ids.

# First step is to pull in all the relevant stop ids from our full dataset
stop_ids <- c(travel_times_alerts$from_stop, travel_times_alerts$to_stop) %>% unique()

# Second step is get the actaul stop names for these stops
stop_names_codes <- Tarchive_stops %>% 
    filter(stop_id %in% stop_ids) %>% 
    select(stop_code, stop_name) %>% 
    unite(stop_code_name, stop_code, stop_name)

# Steps that I am taking to get the tweet-station pair
# 1. Pull in if the delay or issue is from a north_bound or south_bound or both trips
red_line_alert <- red_line_alert %>% 
    mutate(bounded = ifelse(grepl("northbound", text, ignore.case = TRUE), "northbound",
                            ifelse(grepl("southbound", text, ignore.case = TRUE), "southbound",
                                   ifelse(grepl("both ways|both direction", text, ignore.case = TRUE), "northbound_and_southbound", ""))))

# (side-step) level of severity ~ would be cool to draw a graph to see
# if severity increases with delay time or
# time series of severity increasing in the north/south bound trains
red_line_alert <- red_line_alert %>% 
    mutate(severity = ifelse(grepl("minor delay",text, ignore.case = TRUE), 1, 
                             ifelse(grepl("moderate delay",text, ignore.case = TRUE), 2, 
                                    ifelse(grepl("severe delay",text, ignore.case = TRUE), 3, 0))))

# Things are a bit tricky here, because red line has very different 
# different meaning of southbound/northbound ~ inbound/outbound trains
# From the information online 
# Red Line:  Toward Park Street (Green Line intersection) is Inbound; away is Outbound
# http://www.boston-discovery-guide.com/boston-subway.html
# 
# Alewife -- inbound/southbound---> Park street  <--- inbound/northbound---- Braintree/Ashmont
# Alewife <--outbound/northbound--- Park street  ---outbound/southbound----> Braintree/Ashmont
# Lets make a map of stops that are northbound and southbound

# stations with inbound/outbounds, northbound/southbound
northbound_inbound <- c("Braintree", 
                        "Quincy Adams",
                        "Quincy Center",
                        "Wollaston",
                        "North Quincy",
                        "JFK/UMASS Braintree",
                        "Broadway",
                        "South Station",
                        "Downtown Crossing - to Alewife",
                        "Savin",
                        "Fields",
                        "Shawmut",
                        "Ashmont",
                        "Park")

northbound_outbound <- c("Charles",
                         "Kendal",
                         "Central",
                         "Harvard",
                         "Porter",
                         "Davis",
                         "Alewife")

# southbound_outbound <- northbound_inbound # contains same stations
# southbound_inbound <- northbound_outbound # contains same stations

# One way of thinking about it in terms of getting to features are:
# creating column for recognizing train station where delay is been tweeted
red_line_alert <- red_line_alert %>% 
    mutate(text_clone = text) %>% 
    separate(text_clone, into = c("before_at", "after_at"), sep=" at ") %>% 
    mutate(after_at = str_trim(gsub("#mbta|Station|Ave|Street|[.]", "", after_at, ignore.case = TRUE))) %>% 
    rowwise() %>% 
    mutate( after_at = ifelse(!is.na(after_at),
                              ifelse(bounded != "", 
                                     ifelse(length(agrep(after_at, northbound_inbound, ignore.case = TRUE, value = TRUE,max =6))>0 & bounded == "northbound",
                                            paste(after_at, "inbound"),
                                            ifelse(length(agrep(after_at, northbound_inbound, ignore.case = TRUE,max =6))>0 & bounded == "southbound",
                                                   paste(after_at, "outbound"),
                                                   after_at
                                            )
                                     )
                                     , 
                                     ifelse(length(agrep(after_at, northbound_outbound, ignore.case = TRUE, value = TRUE,max =6))>0 & bounded == "northbound",
                                            paste(after_at, "outbound"),
                                            ifelse(length(agrep(after_at, northbound_outbound, ignore.case = TRUE, value = TRUE, max =6))>0 & bounded == "southbound",
                                                   paste(after_at, "inbound"),
                                                   after_at
                                            )
                                     ))
                              , after_at),
            after_at_name_code = 
                ifelse(!is.na(after_at),
                       toString(agrep(after_at, stop_names_codes$stop_code_name, value = TRUE, ignore.case = TRUE)), '')) %>%
    ungroup() %>%
    select(-after_at, -before_at) %>% 
    rename(alerts_at_station_code = after_at_name_code) 
## Warning: Too many values at 1 locations: 76
## Warning: Too few values at 73 locations: 2, 3, 4, 18, 20, 25, 26, 27, 30,
## 31, 32, 34, 36, 37, 38, 41, 43, 44, 45, 48, ...
#red_line_alert

#################################
# Green line 
#################################

# To relate to a data we already have is to add 
# these tweets as "arrival date" ~ "created" and then join them.
green_line_alert <- green_line_alert %>% 
  filter(created > startTime) %>%
  arrange(created) %>% 
  select(text, created, favoriteCount, retweetCount) %>% 
  mutate(arr_dt = created)
#green_line_alert


# if severity increases with delay time or
# time series of severity increasing in the north/south bound trains
green_line_alert <- green_line_alert %>% 
  mutate(severity = ifelse(grepl("minor",text, ignore.case = TRUE), 1, 
                           ifelse(grepl("moderate",text, ignore.case = TRUE), 2, 
                                  ifelse(grepl("severe",text, ignore.case = TRUE), 3, 0))))

# Which green line is the alert from?
green_line_alert$greenLine <- str_extract(green_line_alert$text, "#GreenLine [A-z] ") %>% 
  substr(12,12)

#################################
# Blue line 
#################################

# To relate to a data we already have is to add 
# these tweets as "arrival date" ~ "created" and then join them.
blue_line_alert <- blue_line_alert %>% 
  filter(created > startTime) %>%
  arrange(created) %>% 
  select(text, created, favoriteCount, retweetCount) %>% 
  mutate(arr_dt = created)
#blue_line_alert


# if severity increases with delay time or
# time series of severity increasing in the north/south bound trains
blue_line_alert <- blue_line_alert %>% 
  mutate(severity = ifelse(grepl("minor",text, ignore.case = TRUE), 1, 
                           ifelse(grepl("moderate",text, ignore.case = TRUE), 2, 
                                  ifelse(grepl("severe",text, ignore.case = TRUE), 3, 0))))


#################################
# Orange line 
#################################

# To relate to a data we already have is to add 
# these tweets as "arrival date" ~ "created" and then join them.
orange_line_alert <- orange_line_alert %>% 
  filter(created > startTime) %>%
  arrange(created) %>% 
  select(text, created, favoriteCount, retweetCount) %>% 
  mutate(arr_dt = created)
#orange_line_alert


# if severity increases with delay time or
# time series of severity increasing in the north/south bound trains
orange_line_alert <- orange_line_alert %>% 
  mutate(severity = ifelse(grepl("minor",text, ignore.case = TRUE), 1, 
                           ifelse(grepl("moderate",text, ignore.case = TRUE), 2, 
                                  ifelse(grepl("severe",text, ignore.case = TRUE), 3, 0))))

asdf

#Get data
allstops <- Tarchive_stops %>% 
  select(stop_code, stop_name, stop_lat, stop_lon) %>%
  filter(is.na(stop_code)==FALSE) %>%
  mutate(from_stop=stop_code,to_stop=stop_code)

travel_times_alerts$from_stop <- as.integer(travel_times_alerts$from_stop)
travel_times_alerts$to_stop <- as.integer(travel_times_alerts$to_stop)

#Merge latitude and longitude values
dataset<-travel_times_alerts %>%
  left_join(allstops,by="from_stop") %>%
  select(-stop_code,-to_stop.y) %>%
  rename(to_stop=to_stop.x, start_stop=stop_name.x, start_lat=stop_lat, start_long=stop_lon)

dataset<-dataset %>% 
  left_join(allstops,by="to_stop") %>% 
  select(-stop_code,-from_stop.y) %>%
  rename(end_stop=stop_name, end_lat=stop_lat, end_long=stop_lon, from_stop=from_stop.x)

t.lub <- ymd_hms(dataset$dep_dt)
dataset$time <- round((hour(t.lub) + minute(t.lub)/60), digits=2)

# #Weather data; we're not currently using it this is where it would be pulled
# weather<-weather %>% 
#   filter(DATE>20160124,STATION_NAME=="BOSTON LOGAN INTERNATIONAL AIRPORT MA US") %>%
#   select(DATE,PRCP,SNOW) %>%
#   rename(precipitation=PRCP,snowfall=SNOW)

#RedLine
Red <-  RedLineRoute$stop[[1]] %>%
  select(stop_order, stop_id, stop_name, stop_lat, stop_lon)
Red$stop_lat<-as.numeric(Red$stop_lat)
Red$stop_lon<-as.numeric(Red$stop_lon)

RedAshmont <- Red %>% 
    filter(stop_order %in% c(1, seq(10, 110, 10), seq(130, 170, 10))) %>% 
    mutate(line="RedAshmont")
RedBraintree <- Red %>% 
    filter(stop_order %in% c(1, seq(10, 110, 10), 120, seq(180, 220, 10))) %>% 
    mutate(line="RedBraintree")

Red <- Red %>% mutate(line="Red")

#GreenBLine
GreenB <-  GreenBLineRoute$stop[[1]] %>%
  select(stop_order, stop_id, stop_name, stop_lat, stop_lon)
GreenB$stop_lat<-as.numeric(GreenB$stop_lat)
GreenB$stop_lon<-as.numeric(GreenB$stop_lon)
GreenB<-GreenB %>% mutate(line="GreenB")

#GreenCLine
GreenC <-  GreenCLineRoute$stop[[1]] %>%
  select(stop_order, stop_id, stop_name, stop_lat, stop_lon)
GreenC$stop_lat<-as.numeric(GreenC$stop_lat)
GreenC$stop_lon<-as.numeric(GreenC$stop_lon)
GreenC<-GreenC %>% mutate(line="GreenC")

#GreenDLine
GreenD <-  GreenDLineRoute$stop[[1]] %>%
  select(stop_order, stop_id, stop_name, stop_lat, stop_lon)
GreenD$stop_lat<-as.numeric(GreenD$stop_lat)
GreenD$stop_lon<-as.numeric(GreenD$stop_lon)
GreenD<-GreenD %>% mutate(line="GreenD")

#GreenELine
GreenE <-  GreenELineRoute$stop[[1]] %>%
  select(stop_order, stop_id, stop_name, stop_lat, stop_lon)
GreenE$stop_lat<-as.numeric(GreenE$stop_lat)
GreenE$stop_lon<-as.numeric(GreenE$stop_lon)
GreenE<-GreenE %>% mutate(line="GreenE")

#OrangeLine
Orange <-  OrangeLineRoute$stop[[1]] %>%
  select(stop_order, stop_id, stop_name, stop_lat, stop_lon)
Orange$stop_lat<-as.numeric(Orange$stop_lat)
Orange$stop_lon<-as.numeric(Orange$stop_lon)
Orange<-Orange %>% mutate(line="Orange")

#BlueLine
Blue <-  BlueLineRoute$stop[[1]] %>%
  select(stop_order, stop_id, stop_name, stop_lat, stop_lon)
Blue$stop_lat<-as.numeric(Blue$stop_lat)
Blue$stop_lon<-as.numeric(Blue$stop_lon)
Blue<-Blue %>% mutate(line="Blue")

all_lines<-rbind(Red,RedAshmont,RedBraintree,GreenB,GreenC,GreenD,GreenE,Orange,Blue)
all_lines$stop_lat<-as.numeric(all_lines$stop_lat)
all_lines$stop_lon<-as.numeric(all_lines$stop_lon)

all_lines <- all_lines %>% 
  separate(stop_name, into = c("stop_name", "extra"), sep="-", fill="right") %>%
  select(-extra)
## Warning: Too many values at 4 locations: 64, 93, 117, 141
#Using Leaflet for interactive map
redline<-all_lines %>% filter(line=="Red")
greenBline<-all_lines %>% filter(line=="GreenB")
greenCline<-all_lines %>% filter(line=="GreenC")
greenDline<-all_lines %>% filter(line=="GreenD")
greenEline<-all_lines %>% filter(line=="GreenE")
orangeline<-all_lines %>% filter(line=="Orange")
blueline<-all_lines %>% filter(line=="Blue")

boston <- leaflet() %>% 
  setView(lng = -71.0589, lat = 42.3601, zoom = 12) %>% 
  addProviderTiles("CartoDB.Positron") %>%
  addPolylines(data=redline,~stop_lon,~stop_lat,color="red") %>%
  addPolylines(data=greenBline,~stop_lon,~stop_lat,color="springgreen") %>%
  addPolylines(data=greenCline,~stop_lon,~stop_lat,color="palegreen") %>%
  addPolylines(data=greenDline,~stop_lon,~stop_lat,color="limegreen") %>%
  addPolylines(data=greenEline,~stop_lon,~stop_lat,color="green") %>%
  addPolylines(data=orangeline,~stop_lon,~stop_lat,color="orange") %>%
  addPolylines(data=blueline,~stop_lon,~stop_lat,color="blue") %>%
  addCircleMarkers(data=redline, ~stop_lon,~stop_lat,color="red",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=greenBline,~stop_lon,~stop_lat,color="springgreen",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=greenCline,~stop_lon,~stop_lat,color="palegreen",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=greenDline,~stop_lon,~stop_lat,color="limegreen",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=greenEline,~stop_lon,~stop_lat,color="green",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=orangeline,~stop_lon,~stop_lat,color="orange",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=blueline,~stop_lon,~stop_lat,color="blue",radius=1,popup=~stop_name)
#boston

#Average travel times between stops
traveltime<-dataset %>% group_by(start_stop,end_stop) %>%
  summarize(avg_traveltime=mean(travel_time_sec))

betweenstops<-dataset %>% left_join(traveltime,by=c("start_stop","end_stop")) %>%
  mutate(residualtime=benchmark_travel_time_sec-travel_time_sec)

#Delay vs. hour of day plot, input using start and stop_codes
#Rush hour highlighted at 7-9AM and 5-7PM
traveltimes_day<-function(fromstation,tostation){
  travel<-betweenstops %>% 
    filter(start_stop==fromstation & end_stop==tostation) %>%
    group_by(time) %>% summarize(mean(residualtime))
  
  dygraph(travel,main=c(as.character(fromstation), as.character(tostation))) %>% 
    dyAxis("y", label = "Delay in Seconds") %>%
    dyAxis("x", label="Hour of Day") %>% dyRangeSelector() %>%
    dyEvent("12.00", label="Noon", labelLoc = "bottom") %>%
    dyShading(from = "7.00", to = "9.00",color="#FFE6E6") %>%
    dyShading(from = "17.00", to = "19.00",color="#CCEBD6") }

#Examples
traveltimes_day("Park Street - to Alewife","Charles/MGH - Outbound")
traveltimes_day("Charles/MGH - Outbound","Kendall/MIT - Outbound")
# Twitter analysis ~ for green line
# Getting actual stop names for Greenline stops
greenB_stop_names_codes <- GreenB %>% 
  select(stop_id, stop_name) %>% 
  unite(stop_code_name, stop_id, stop_name)

greenC_stop_names_codes <- GreenC %>% 
  select(stop_id, stop_name) %>% 
  unite(stop_code_name, stop_id, stop_name)

greenD_stop_names_codes <- GreenD %>% 
  select(stop_id, stop_name) %>% 
  unite(stop_code_name, stop_id, stop_name)

greenE_stop_names_codes <- GreenE %>% 
  select(stop_id, stop_name) %>% 
  unite(stop_code_name, stop_id, stop_name)

### creating column for recognizing train station where delay is been tweeted ~ green line
green_line_alert <- green_line_alert %>% 
  mutate(text_clone = text) %>% 
  separate(text_clone, into = c("before_at", "after_at"), sep=" at ") %>% 
  mutate(after_at = str_trim(gsub("#mbta|Station|Ave|Street|[.]", "", after_at, ignore.case = TRUE))) %>% 
  rowwise() %>% 
  mutate(after_at_name_code = 
           ifelse(!is.na(after_at) & !is.na(greenLine) & greenLine == "B",
                  toString(agrep(after_at, greenB_stop_names_codes$stop_code_name, value = TRUE, ignore.case = TRUE)), 
                  ifelse(!is.na(after_at) & !is.na(greenLine) & greenLine == 'C',
                         toString(agrep(after_at, greenC_stop_names_codes$stop_code_name, value = TRUE, ignore.case = TRUE)),
                         ifelse(!is.na(after_at) & !is.na(greenLine) & greenLine == 'D',
                                toString(agrep(after_at, greenD_stop_names_codes$stop_code_name, value = TRUE, ignore.case = TRUE)),
                                ifelse(!is.na(after_at) & !is.na(greenLine) & greenLine == 'E',
                                       toString(agrep(after_at, greenD_stop_names_codes$stop_code_name, value = TRUE, ignore.case = TRUE)),
                                       ''
                                       )
                         )
                  )
           )
  ) %>% 
  ungroup() %>% 
  select(-after_at, -before_at) %>% 
  rename(alerts_at_station_code = after_at_name_code)
## Warning: Too many values at 4 locations: 39, 253, 254, 256
## Warning: Too few values at 116 locations: 2, 3, 5, 6, 9, 11, 12, 19, 20,
## 21, 22, 26, 29, 31, 40, 47, 51, 54, 59, 61, ...
## Twitter analysis ~ for blue line
## Getting actaul stop names for Blue line stops

blue_line_stop_names_codes <- Blue %>% 
  select(stop_id, stop_name) %>% 
  unite(stop_code_name, stop_id, stop_name)

### creating column for recognizing train station where delay is been tweeted ~ blue line

blue_line_alert <- blue_line_alert %>% 
  mutate(text_clone = text) %>% 
  separate(text_clone, into = c("before_at", "after_at"), sep=" at ") %>% 
  mutate(after_at = str_trim(gsub("#mbta|Station|Ave|Street|[.]", "", after_at, ignore.case = TRUE))) %>% 
  rowwise() %>% 
  mutate(after_at_name_code = 
           ifelse(!is.na(after_at),
                  toString(agrep(after_at, blue_line_stop_names_codes$stop_code_name, value = TRUE, ignore.case = TRUE)), 
                  ''
           )
  ) %>% 
ungroup() %>% 
  select(-after_at, -before_at) %>% 
  rename(alerts_at_station_code = after_at_name_code) 
## Warning: Too many values at 2 locations: 10, 28
## Warning: Too few values at 14 locations: 2, 4, 7, 13, 17, 19, 20, 21, 31,
## 36, 42, 48, 60, 63
## Twitter analysis ~ for orange line
## Getting actaul stop names for Orange line stops

orange_line_stop_names_codes <- Orange %>% 
  select(stop_id, stop_name) %>% 
  unite(stop_code_name, stop_id, stop_name)

### creating column for recognizing train station where delay is been tweeted ~ orange line

orange_line_alert <- orange_line_alert %>% 
  mutate(text_clone = text) %>% 
  separate(text_clone, into = c("before_at", "after_at"), sep=" at ") %>% 
  mutate(after_at = str_trim(gsub("#mbta|Station|Ave|Street|[.]", "", after_at, ignore.case = TRUE))) %>% 
  rowwise() %>% 
  mutate(after_at_name_code = 
           ifelse(!is.na(after_at),
                  toString(agrep(after_at, orange_line_stop_names_codes$stop_code_name, value = TRUE, ignore.case = TRUE)), 
                  ''
           )
  ) %>% 
  ungroup() %>% 
  select(-after_at, -before_at) %>% 
  rename(alerts_at_station_code = after_at_name_code)
## Warning: Too few values at 38 locations: 8, 9, 10, 19, 23, 29, 33, 50, 56,
## 71, 73, 75, 76, 77, 78, 82, 84, 88, 89, 92, ...

This is a slightly simpler initial version of our map-of-Boston code, for reference.

#Map of Boston
 lon_range <- extendrange(dataset$end_long)
 lat_range <- extendrange(dataset$end_lat)
 
 gc <- geocode("boston massachusetts")
 map <- get_map(gc,maptype = "toner-lite",calc_zoom(lon_range, lat_range))
 
 ggmap(map)+
   geom_path(data=Red,aes(x=stop_lon,y=stop_lat),color="red",size=1)+
   geom_path(data=GreenB,aes(x=stop_lon,y=stop_lat),color="green",size=1)+
   geom_path(data=GreenC,aes(x=stop_lon,y=stop_lat),color="green",size=1)+
   geom_path(data=GreenD,aes(x=stop_lon,y=stop_lat),color="green",size=1)+
   geom_path(data=GreenE,aes(x=stop_lon,y=stop_lat),color="green",size=1)+
   geom_path(data=Orange,aes(x=stop_lon,y=stop_lat),color="orange",size=1)+
   geom_path(data=Blue,aes(x=stop_lon,y=stop_lat),color="blue",size=1)+
   geom_point(data=Red,aes(x=stop_lon,y=stop_lat),color="black",size=0.5)+
   geom_point(data=GreenB,aes(x=stop_lon,y=stop_lat),color="black",size=0.5)+
   geom_point(data=GreenC,aes(x=stop_lon,y=stop_lat),color="black",size=0.5)+
   geom_point(data=GreenD,aes(x=stop_lon,y=stop_lat),color="black",size=0.5)+
   geom_point(data=GreenE,aes(x=stop_lon,y=stop_lat),color="black",size=0.5)+
   geom_point(data=Orange,aes(x=stop_lon,y=stop_lat),color="black",size=0.5)+
   geom_point(data=Blue,aes(x=stop_lon,y=stop_lat),color="black",size=0.5)+
   theme(axis.ticks = element_blank(), axis.text = element_blank(),axis.title=element_blank())
#Using Leaflet
redline<-all_lines %>% filter(line=="Red")
redAshmontline<-all_lines %>% filter(line=="RedAshmont")
redBraintreeline<-all_lines %>% filter(line=="RedBraintree")
greenBline<-all_lines %>% filter(line=="GreenB")
greenCline<-all_lines %>% filter(line=="GreenC")
greenDline<-all_lines %>% filter(line=="GreenD")
greenEline<-all_lines %>% filter(line=="GreenE")
orangeline<-all_lines %>% filter(line=="Orange")
blueline<-all_lines %>% filter(line=="Blue")

boston <- leaflet() %>% 
#  setView(lng = -71.0589, lat = 42.3601, zoom = 12) %>% 
  addProviderTiles("CartoDB.Positron") %>%
#  addPolylines(data=redline,~stop_lon,~stop_lat,color="red") %>%
  addPolylines(data=redAshmontline,~stop_lon,~stop_lat,color="red") %>%
  addPolylines(data=redBraintreeline,~stop_lon,~stop_lat,color="red") %>%
  addPolylines(data=greenBline,~stop_lon,~stop_lat,color="springgreen") %>%
  addPolylines(data=greenCline,~stop_lon,~stop_lat,color="palegreen") %>%
  addPolylines(data=greenDline,~stop_lon,~stop_lat,color="limegreen") %>%
  addPolylines(data=greenEline,~stop_lon,~stop_lat,color="green") %>%
  addPolylines(data=orangeline,~stop_lon,~stop_lat,color="orange") %>%
  addPolylines(data=blueline,~stop_lon,~stop_lat,color="blue") %>%
  addCircleMarkers(data=redline, ~stop_lon,~stop_lat,color="red",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=greenBline,~stop_lon,~stop_lat,color="springgreen",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=greenCline,~stop_lon,~stop_lat,color="palegreen",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=greenDline,~stop_lon,~stop_lat,color="limegreen",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=greenEline,~stop_lon,~stop_lat,color="green",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=orangeline,~stop_lon,~stop_lat,color="orange",radius=1,popup=~stop_name) %>%
  addCircleMarkers(data=blueline,~stop_lon,~stop_lat,color="blue",radius=1,popup=~stop_name)
boston
#Average travel times between stops
traveltime<-dataset %>% group_by(start_stop,end_stop) %>%
  summarize(avg_traveltime=mean(travel_time_sec))

betweenstops<-dataset %>% left_join(traveltime,by=c("start_stop","end_stop")) %>%
  mutate(residualtime=avg_traveltime-travel_time_sec)

#Delay vs. hour of day plot, input using start and stop_codes
traveltimes_day<-function(fromstation,tostation){
  travel<-betweenstops %>% 
    filter(start_stop==fromstation & end_stop==tostation) %>%
    group_by(time) %>% summarize(mean(residualtime))
  
  dygraph(travel,main=c(as.character(fromstation), as.character(tostation))) %>% 
    dyAxis("y", label = "Delay in Seconds") %>%
    dyAxis("x", label="Hour of Day") %>% dyRangeSelector() %>%
    dyEvent("12.00", label="Noon", labelLoc = "bottom") %>%
    dyShading(from = "7.00", to = "9.00",color="#FFE6E6") %>%
    dyShading(from = "17.00", to = "19.00",color="#CCEBD6") }

#Examples
traveltimes_day("Park Street - to Alewife","Charles/MGH - Outbound")
traveltimes_day("Charles/MGH - Outbound","Kendall/MIT - Outbound")
#Tweets
travel_times_alerts$arr_dt<-ymd_hms(travel_times_alerts$arr_dt)
travel_times_alerts$dep_dt<-ymd_hms(travel_times_alerts$dep_dt)
red_line_alert$arr_dt<-ymd_hms(red_line_alert$arr_dt)
travel_times_alerts2 <- travel_times_alerts %>% full_join(red_line_alert) %>% 
  filter(is.na(severity)==FALSE)
## Joining by: c("arr_dt", "text", "created", "favoriteCount", "retweetCount")
#Station of concern from alert
regexp <- "[[:digit:]]+"
travel_times_alerts2$stop_code<-as.integer(str_extract(travel_times_alerts2$alerts_at_station_code, regexp))

#Get latitude and longitude values
travel_times_alerts3 <- travel_times_alerts2 %>%
  left_join(allstops,by="stop_code") %>%
  select(dep_dt,arr_dt,severity,stop_code,stop_name.y,stop_lat,stop_lon) %>%
  filter(is.na(stop_lat)==FALSE) %>% filter(is.na(stop_lon)==FALSE)

#copy of alerts data 
#why make copies?
redalerts<-red_line_alert
greenalerts<-green_line_alert
bluealerts<-blue_line_alert
orangealerts<-orange_line_alert

#Extract station codes
regexp <- "[[:digit:]]+"
redalerts$stop_code<-as.integer(str_extract(redalerts$alerts_at_station_code, regexp))
greenalerts$stop_code<-as.integer(str_extract(greenalerts$alerts_at_station_code, regexp))
orangealerts$stop_code<-as.integer(str_extract(orangealerts$alerts_at_station_code, regexp))
bluealerts$stop_code<-as.integer(str_extract(bluealerts$alerts_at_station_code, regexp))

greenalerts<-greenalerts %>% 
  select(arr_dt,severity,greenLine,stop_code) %>% rename(line=greenLine)
greenalerts$line[greenalerts$line=="B"]<-"GreenB"
greenalerts$line[greenalerts$line=="C"]<-"GreenC"
greenalerts$line[greenalerts$line=="D"]<-"GreenD"
greenalerts$line[greenalerts$line=="E"]<-"GreenE"
greenalerts<-greenalerts %>% filter(-is.na(stop_code)==FALSE)
greenalerts$arr_dt<-ymd_hms(greenalerts$arr_dt)

redalerts<-redalerts %>% mutate(line="Red") %>%
  select(arr_dt,severity,line,stop_code) %>% filter(-is.na(stop_code)==FALSE)
redalerts$arr_dt<-ymd_hms(redalerts$arr_dt)
bluealerts<-bluealerts %>% mutate(line="Blue") %>%
  select(arr_dt,severity,line,stop_code) %>% filter(-is.na(stop_code)==FALSE)
bluealerts$arr_dt<-ymd_hms(bluealerts$arr_dt)
orangealerts<-orangealerts %>% mutate(line="Orange") %>%
  select(arr_dt,severity,line,stop_code) %>% filter(-is.na(stop_code)==FALSE)
orangealerts$arr_dt<-ymd_hms(orangealerts$arr_dt)

#Merge all alerts
allalerts<-rbind(redalerts,greenalerts,bluealerts,orangealerts)

#Get latitude and longitude values
allstops <- allstops %>% select(-to_stop)
allalerts <- allalerts %>% left_join(allstops,by="stop_code") %>% filter(severity!=0)
  
#All lines severity of alerts
boston %>% 
  addMarkers(data=allalerts,~stop_lon,~stop_lat,clusterOptions=markerClusterOptions())

Final Analysis

asdf

#remove this when we're done asdf
ls.str()
## access_secret :  chr "Ib5N3qKxZ7CT1TuQeznHv6XobdCmjZkSVTESkVj7TwVZm"
## access_token :  chr "111824999-KVpkYnMt3MZU2Bfxl9lcHZfMvdF5pYZiHQqSonE6"
## all_lines : 'data.frame':    187 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70061" "70063" "70065" "70067" ...
##  $ stop_name : chr  "Alewife" "Davis " "Porter " "Harvard " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "Red" "Red" "Red" "Red" ...
## allalerts : Classes 'tbl_df', 'tbl' and 'data.frame':    280 obs. of  8 variables:
##  $ arr_dt   : POSIXct, format: "2016-01-25 23:36:17" "2016-01-28 23:00:48" ...
##  $ severity : num  2 2 2 1 2 2 2 2 2 1 ...
##  $ line     : chr  "Red" "Red" "Red" "Red" ...
##  $ stop_code: int  70069 70061 70073 70073 70077 70101 70069 70075 70079 70097 ...
##  $ stop_name: Factor w/ 7470 levels " Franklin St @ Washington St",..: 1802 761 1985 1985 2504 5556 1802 5309 6161 4960 ...
##  $ stop_lat : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ from_stop: int  70069 70061 70073 70073 70077 70101 70069 70075 70079 70097 ...
## allstops : 'data.frame': 8104 obs. of  5 variables:
##  $ stop_code: int  1 10 10000 10003 10005 10006 10007 10008 10009 1001 ...
##  $ stop_name: Factor w/ 7470 levels " Franklin St @ Washington St",..: 7113 6360 6452 758 756 757 481 759 3287 4922 ...
##  $ stop_lat : num  42.3 42.3 42.4 42.3 42.3 ...
##  $ stop_lon : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ from_stop: int  1 10 10000 10003 10005 10006 10007 10008 10009 1001 ...
## betweenstops : 'data.frame': 441346 obs. of  32 variables:
##  $ direction                : chr  "0" "0" "0" "0" ...
##  $ dep_dt                   : POSIXct, format: "2016-01-25 05:20:13" "2016-01-25 05:22:26" ...
##  $ arr_dt                   : POSIXct, format: "2016-01-25 05:21:23" "2016-01-25 05:23:59" ...
##  $ travel_time_sec          : int  70 93 147 102 71 295 115 76 134 119 ...
##  $ benchmark_travel_time_sec: int  120 180 180 120 120 120 180 120 180 180 ...
##  $ from_stop                : int  70063 70065 70067 70069 70063 70071 70065 70063 70067 70065 ...
##  $ to_stop                  : int  70065 70067 70069 70071 70065 70073 70067 70065 70069 70067 ...
##  $ dep_d                    : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ dep_t                    : chr  "05:20:13" "05:22:26" "05:24:54" "05:28:11" ...
##  $ arr_d                    : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ arr_t                    : chr  "05:21:23" "05:23:59" "05:27:21" "05:29:53" ...
##  $ time_delta               : int  -50 -87 -33 -18 -49 175 -65 -44 -46 -61 ...
##  $ dt                       : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ is_weekend               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ dep_time                 :Class 'difftime'  atomic [1:441346] 5.34 5.37 5.42 5.47 5.5 ...
##  $ start_stop               : chr  "Davis - Inbound" "Porter - Inbound" "Harvard - Inbound" "Central - Inbound" ...
##  $ parent_station_name      : chr  "Davis" "Porter" "Harvard" "Central" ...
##  $ heading                  : chr  "Southbound" "Southbound" "Southbound" "Southbound" ...
##  $ stop_seq                 : int  2 3 4 5 2 6 3 2 4 3 ...
##  $ text                     : chr  NA NA NA NA ...
##  $ created                  : POSIXct, format: NA NA ...
##  $ favoriteCount            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ retweetCount             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ stop_name.y              : Factor w/ 7470 levels " Franklin St @ Washington St",..: 2376 5486 3208 1802 2376 3610 5486 2376 3208 5486 ...
##  $ start_lat                : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ start_long               : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ end_stop                 : Factor w/ 7470 levels " Franklin St @ Washington St",..: 5486 3208 1802 3610 5486 1985 3208 5486 1802 3208 ...
##  $ end_lat                  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ end_long                 : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ time                     : num  5.33 5.37 5.4 5.47 5.48 5.5 5.52 5.57 5.57 5.6 ...
##  $ avg_traveltime           : num  67.4 118.1 146.8 103.9 67.4 ...
##  $ residualtime             : num  -2.642 25.069 -0.174 1.853 -3.642 ...
## Blue : 'data.frame': 12 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70060" "70057" "70055" "70053" ...
##  $ stop_name : chr  "Wonderland" "Revere Beach - Inbound" "Beachmont - Inbound" "Suffolk Downs - Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71 -71 -71 -71 -71 ...
##  $ line      : chr  "Blue" "Blue" "Blue" "Blue" ...
## blue_line_alert : Classes 'tbl_df' and 'data.frame': 64 obs. of  7 variables:
##  $ text                  : chr  "Bowdoin will be closed beginning at approximately 8:45 p.m. through the end of service on Feb 28-Mar 2. Please board and exit t"| __truncated__ "#BlueLine experiencing minor westbound delays due to a signal problem between Wonderland and Revere Beach Stations. #mbta" "#BlueLine experiencing moderate delays in both directions due to a disabled train at Maverick. #mbta" "#BlueLine experiencing moderate delays due to disabled train #mbta" ...
##  $ created               : POSIXct, format: "2016-02-01 13:28:36" "2016-02-02 11:17:24" ...
##  $ favoriteCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ retweetCount          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ arr_dt                : POSIXct, format: "2016-02-01 13:28:36" "2016-02-02 11:17:24" ...
##  $ severity              : num  0 1 2 2 1 1 2 0 2 0 ...
##  $ alerts_at_station_code: chr  "" "" "70045_Maverick - Inbound" "" ...
## blue_line_stop_names_codes : 'data.frame':   12 obs. of  1 variable:
##  $ stop_code_name: chr  "70060_Wonderland" "70057_Revere Beach - Inbound" "70055_Beachmont - Inbound" "70053_Suffolk Downs - Inbound" ...
## bluealerts : Classes 'tbl_df', 'tbl' and 'data.frame':   44 obs. of  4 variables:
##  $ arr_dt   : POSIXct, format: "2016-02-05 20:03:46" "2016-02-08 20:00:59" ...
##  $ severity : num  2 1 1 0 2 2 2 2 2 1 ...
##  $ line     : chr  "Blue" "Blue" "Blue" "Blue" ...
##  $ stop_code: int  70045 70060 70045 70060 70060 70043 70060 70060 70049 70060 ...
## blueline : 'data.frame': 12 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70060" "70057" "70055" "70053" ...
##  $ stop_name : chr  "Wonderland" "Revere Beach " "Beachmont " "Suffolk Downs " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71 -71 -71 -71 -71 ...
##  $ line      : chr  "Blue" "Blue" "Blue" "Blue" ...
## BlueLineRoute : 'data.frame':    2 obs. of  3 variables:
##  $ direction_id  : chr  "0" "1"
##  $ direction_name: chr  "Westbound" "Eastbound"
##  $ stop          :List of 2
## boston : List of 8
##  $ x            :List of 2
##  $ width        : NULL
##  $ height       : NULL
##  $ sizingPolicy :List of 6
##  $ dependencies :List of 2
##  $ elementId    : NULL
##  $ preRenderHook: NULL
##  $ jsHooks      : list()
## consumer_key :  chr "bpCAcAY27kfpSyOAOFXNP2PsO"
## consumer_secret :  chr "5skjmU5FgWUA77PI4OwuBLcmv3Rr03xEKZQoG0FJJbI0wt3oMa"
## count_of_days : Classes 'tbl_df', 'tbl' and 'data.frame':    2 obs. of  2 variables:
##  $ weekend      : chr  "Weekday" "Weekend"
##  $ days_observed: num  70 28
## dataset : 'data.frame':  441346 obs. of  30 variables:
##  $ direction                : chr  "0" "0" "0" "0" ...
##  $ dep_dt                   : POSIXct, format: "2016-01-25 05:20:13" "2016-01-25 05:22:26" ...
##  $ arr_dt                   : POSIXct, format: "2016-01-25 05:21:23" "2016-01-25 05:23:59" ...
##  $ travel_time_sec          : int  70 93 147 102 71 295 115 76 134 119 ...
##  $ benchmark_travel_time_sec: int  120 180 180 120 120 120 180 120 180 180 ...
##  $ from_stop                : int  70063 70065 70067 70069 70063 70071 70065 70063 70067 70065 ...
##  $ to_stop                  : int  70065 70067 70069 70071 70065 70073 70067 70065 70069 70067 ...
##  $ dep_d                    : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ dep_t                    : chr  "05:20:13" "05:22:26" "05:24:54" "05:28:11" ...
##  $ arr_d                    : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ arr_t                    : chr  "05:21:23" "05:23:59" "05:27:21" "05:29:53" ...
##  $ time_delta               : int  -50 -87 -33 -18 -49 175 -65 -44 -46 -61 ...
##  $ dt                       : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ is_weekend               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ dep_time                 :Class 'difftime'  atomic [1:441346] 5.34 5.37 5.42 5.47 5.5 ...
##  $ start_stop               : chr  "Davis - Inbound" "Porter - Inbound" "Harvard - Inbound" "Central - Inbound" ...
##  $ parent_station_name      : chr  "Davis" "Porter" "Harvard" "Central" ...
##  $ heading                  : chr  "Southbound" "Southbound" "Southbound" "Southbound" ...
##  $ stop_seq                 : int  2 3 4 5 2 6 3 2 4 3 ...
##  $ text                     : chr  NA NA NA NA ...
##  $ created                  : POSIXct, format: NA NA ...
##  $ favoriteCount            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ retweetCount             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ stop_name.y              : Factor w/ 7470 levels " Franklin St @ Washington St",..: 2376 5486 3208 1802 2376 3610 5486 2376 3208 5486 ...
##  $ start_lat                : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ start_long               : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ end_stop                 : Factor w/ 7470 levels " Franklin St @ Washington St",..: 5486 3208 1802 3610 5486 1985 3208 5486 1802 3208 ...
##  $ end_lat                  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ end_long                 : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ time                     : num  5.33 5.37 5.4 5.47 5.48 5.5 5.52 5.57 5.57 5.6 ...
## distinct_stop_pairs : 'data.frame':  44 obs. of  2 variables:
##  $ stop_id  : Factor w/ 43 levels "70061","70063",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ next_stop: Factor w/ 43 levels "70063","70065",..: 1 2 3 4 5 6 7 8 9 10 ...
## green_line_alert : Classes 'tbl_df' and 'data.frame':    270 obs. of  8 variables:
##  $ text                  : chr  "#GreenLine D branch experiencing moderate delays due to a disabled train at Reservoir #mbta" "#GreenLine D branch experiencing minor delays in both directions due to a signal problem between Chestnut Hill and Reservoir St"| __truncated__ "#GreenLine C branch experiencing minor eastbound delays due to disabled train #mbta" "#GreenLine B branch experiencing minor eastbound delays due to a disabled train at Packard's Corner Station. #mbta" ...
##  $ created               : POSIXct, format: "2016-02-15 20:07:46" "2016-02-16 13:27:50" ...
##  $ favoriteCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ retweetCount          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ arr_dt                : POSIXct, format: "2016-02-15 20:07:46" "2016-02-16 13:27:50" ...
##  $ severity              : num  2 1 1 1 1 2 2 1 2 2 ...
##  $ greenLine             : chr  "D" "D" "C" "B" ...
##  $ alerts_at_station_code: chr  "70175_Reservoir - Outbound" "" "" "70135_Packards Corner - Outbound" ...
## greenalerts : Classes 'tbl_df', 'tbl' and 'data.frame':  87 obs. of  4 variables:
##  $ arr_dt   : POSIXct, format: "2016-02-15 20:07:46" "2016-02-16 16:41:24" ...
##  $ severity : num  2 1 2 2 2 2 2 1 2 1 ...
##  $ line     : chr  "GreenD" "GreenB" "GreenB" "GreenB" ...
##  $ stop_code: int  70175 70135 70139 70135 70187 70107 70165 70206 70125 70208 ...
## GreenB : 'data.frame':   29 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70210" "70208" "70206" "70204" ...
##  $ stop_name : chr  "Lechmere - Inbound" "Science Park - Inbound" "North Station - Green Line Inbound" "Haymarket - Green Line Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "GreenB" "GreenB" "GreenB" "GreenB" ...
## greenB_stop_names_codes : 'data.frame':  29 obs. of  1 variable:
##  $ stop_code_name: chr  "70210_Lechmere - Inbound" "70208_Science Park - Inbound" "70206_North Station - Green Line Inbound" "70204_Haymarket - Green Line Inbound" ...
## greenBline : 'data.frame':   29 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70210" "70208" "70206" "70204" ...
##  $ stop_name : chr  "Lechmere " "Science Park " "North Station " "Haymarket " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "GreenB" "GreenB" "GreenB" "GreenB" ...
## GreenBLineRoute : 'data.frame':  2 obs. of  3 variables:
##  $ direction_id  : chr  "0" "1"
##  $ direction_name: chr  "Westbound" "Eastbound"
##  $ stop          :List of 2
## GreenC : 'data.frame':   24 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70210" "70208" "70206" "70204" ...
##  $ stop_name : chr  "Lechmere - Inbound" "Science Park - Inbound" "North Station - Green Line Inbound" "Haymarket - Green Line Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "GreenC" "GreenC" "GreenC" "GreenC" ...
## greenC_stop_names_codes : 'data.frame':  24 obs. of  1 variable:
##  $ stop_code_name: chr  "70210_Lechmere - Inbound" "70208_Science Park - Inbound" "70206_North Station - Green Line Inbound" "70204_Haymarket - Green Line Inbound" ...
## greenCline : 'data.frame':   24 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70210" "70208" "70206" "70204" ...
##  $ stop_name : chr  "Lechmere " "Science Park " "North Station " "Haymarket " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "GreenC" "GreenC" "GreenC" "GreenC" ...
## GreenCLineRoute : 'data.frame':  2 obs. of  3 variables:
##  $ direction_id  : chr  "0" "1"
##  $ direction_name: chr  "Westbound" "Eastbound"
##  $ stop          :List of 2
## GreenD : 'data.frame':   24 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70210" "70208" "70206" "70204" ...
##  $ stop_name : chr  "Lechmere - Inbound" "Science Park - Inbound" "North Station - Green Line Inbound" "Haymarket - Green Line Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "GreenD" "GreenD" "GreenD" "GreenD" ...
## greenD_stop_names_codes : 'data.frame':  24 obs. of  1 variable:
##  $ stop_code_name: chr  "70210_Lechmere - Inbound" "70208_Science Park - Inbound" "70206_North Station - Green Line Inbound" "70204_Haymarket - Green Line Inbound" ...
## greenDline : 'data.frame':   24 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70210" "70208" "70206" "70204" ...
##  $ stop_name : chr  "Lechmere " "Science Park " "North Station " "Haymarket " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "GreenD" "GreenD" "GreenD" "GreenD" ...
## GreenDLineRoute : 'data.frame':  2 obs. of  3 variables:
##  $ direction_id  : chr  "0" "1"
##  $ direction_name: chr  "Westbound" "Eastbound"
##  $ stop          :List of 2
## GreenE : 'data.frame':   20 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70210" "70208" "70206" "70204" ...
##  $ stop_name : chr  "Lechmere - Inbound" "Science Park - Inbound" "North Station - Green Line Inbound" "Haymarket - Green Line Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "GreenE" "GreenE" "GreenE" "GreenE" ...
## greenE_stop_names_codes : 'data.frame':  20 obs. of  1 variable:
##  $ stop_code_name: chr  "70210_Lechmere - Inbound" "70208_Science Park - Inbound" "70206_North Station - Green Line Inbound" "70204_Haymarket - Green Line Inbound" ...
## greenEline : 'data.frame':   20 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70210" "70208" "70206" "70204" ...
##  $ stop_name : chr  "Lechmere " "Science Park " "North Station " "Haymarket " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "GreenE" "GreenE" "GreenE" "GreenE" ...
## GreenELineRoute : 'data.frame':  2 obs. of  3 variables:
##  $ direction_id  : chr  "0" "1"
##  $ direction_name: chr  "Westbound" "Eastbound"
##  $ stop          :List of 2
## headway_plots : List of 4
##  $ :List of 9
##  $ :List of 9
##  $ :List of 9
##  $ :List of 9
## headway_times : 'data.frame':    397153 obs. of  21 variables:
##  $ prev_route_id             : chr  "Red" "Red" "Red" "Red" ...
##  $ direction                 : chr  "0" "0" "0" "0" ...
##  $ current_dep_dt            : POSIXct, format: "2016-01-25 05:29:48" "2016-01-25 05:31:47" ...
##  $ previous_dep_dt           : POSIXct, format: "2016-01-25 05:20:13" "2016-01-25 05:22:26" ...
##  $ headway_time_sec          : int  575 561 272 594 279 617 279 577 406 279 ...
##  $ benchmark_headway_time_sec: int  480 405 405 420 405 405 420 405 405 405 ...
##  $ from_stop                 : chr  "70063" "70065" "70063" "70067" ...
##  $ to_stop                   : chr  "70065" "70067" "70065" "70069" ...
##  $ current_dep_d             : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ current_dep_t             : chr  "05:29:48" "05:31:47" "05:34:20" "05:34:48" ...
##  $ previous_dep_d            : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ previous_dep_t            : chr  "05:20:13" "05:22:26" "05:29:48" "05:24:54" ...
##  $ time_delta                : int  95 156 -133 174 -126 212 -141 172 1 -126 ...
##  $ lateness                  : num  95 156 0 174 0 212 0 172 1 0 ...
##  $ dt                        : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ is_weekend                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ dep_time                  :Class 'difftime'  atomic [1:397153] 5.5 5.53 5.57 5.58 5.61 ...
##  $ stop_name                 : chr  "Davis - Inbound" "Porter - Inbound" "Davis - Inbound" "Harvard - Inbound" ...
##  $ parent_station_name       : chr  "Davis" "Porter" "Davis" "Harvard" ...
##  $ heading                   : chr  "Southbound" "Southbound" "Southbound" "Southbound" ...
##  $ stop_seq                  : int  2 3 2 4 3 5 4 6 2 5 ...
## i :  int 100
## iDate :  Date[1:1], format: "2016-05-04"
## MattapanLineRoute : 'data.frame':    2 obs. of  3 variables:
##  $ direction_id  : chr  "0" "1"
##  $ direction_name: chr  "Outbound" "Inbound"
##  $ stop          :List of 2
## north_a : 'data.frame':  5 obs. of  2 variables:
##  $ stop_id  : Factor w/ 5 levels "70086","70088",..: 5 4 3 2 1
##  $ next_stop: Factor w/ 5 levels "70084","70086",..: 5 4 3 2 1
## north_b : 'data.frame':  6 obs. of  2 variables:
##  $ stop_id  : Factor w/ 6 levels "70096","70098",..: 6 5 4 3 2 1
##  $ next_stop: Factor w/ 6 levels "70084","70096",..: 6 5 4 3 2 1
## north_main : 'data.frame':   11 obs. of  2 variables:
##  $ stop_id  : Factor w/ 11 levels "70064","70066",..: 11 10 9 8 7 6 5 4 3 2 ...
##  $ next_stop: Factor w/ 11 levels "70061","70064",..: 11 10 9 8 7 6 5 4 3 2 ...
## northbound_inbound :  chr [1:14] "Braintree" "Quincy Adams" "Quincy Center" ...
## northbound_outbound :  chr [1:7] "Charles" "Kendal" "Central" "Harvard" "Porter" ...
## Orange : 'data.frame':   20 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70036" "70034" "70032" "70278" ...
##  $ stop_name : chr  "Oak Grove" "Malden - Inbound" "Wellington - Inbound" "Assembly - Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "Orange" "Orange" "Orange" "Orange" ...
## orange_line_alert : Classes 'tbl_df' and 'data.frame':   139 obs. of  7 variables:
##  $ text                  : chr  "#OrangeLine experiencing minor delays due to a disabled train at Oak Grove #mbta" "#OrangeLine experiencing moderate delays due to police action at North Station #mbta" "#OrangeLine experiencing minor delays due to a signal problem at Roxbury Crossing #mbta" "#OrangeLine experiencing moderate northbound delays due to police action at Green Street. #mbta" ...
##  $ created               : POSIXct, format: "2016-01-25 21:17:54" "2016-01-26 21:21:20" ...
##  $ favoriteCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ retweetCount          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ arr_dt                : POSIXct, format: "2016-01-25 21:17:54" "2016-01-26 21:21:20" ...
##  $ severity              : num  1 2 1 2 2 2 2 1 1 2 ...
##  $ alerts_at_station_code: chr  "70036_Oak Grove" "70026_North Station - Orange Line Inbound" "70008_Roxbury Crossing - Outbound" "70002_Green Street - Outbound" ...
## orange_line_stop_names_codes : 'data.frame': 20 obs. of  1 variable:
##  $ stop_code_name: chr  "70036_Oak Grove" "70034_Malden - Inbound" "70032_Wellington - Inbound" "70278_Assembly - Inbound" ...
## orangealerts : Classes 'tbl_df', 'tbl' and 'data.frame': 94 obs. of  4 variables:
##  $ arr_dt   : POSIXct, format: "2016-01-25 21:17:54" "2016-01-26 21:21:20" ...
##  $ severity : num  1 2 1 2 2 2 2 1 2 2 ...
##  $ line     : chr  "Orange" "Orange" "Orange" "Orange" ...
##  $ stop_code: int  70036 70026 70008 70002 70016 70024 70020 70012 70028 70278 ...
## orangeline : 'data.frame':   20 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70036" "70034" "70032" "70278" ...
##  $ stop_name : chr  "Oak Grove" "Malden " "Wellington " "Assembly " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "Orange" "Orange" "Orange" "Orange" ...
## OrangeLineRoute : 'data.frame':  2 obs. of  3 variables:
##  $ direction_id  : chr  "0" "1"
##  $ direction_name: chr  "Southbound" "Northbound"
##  $ stop          :List of 2
## Red : 'data.frame':  23 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70061" "70063" "70065" "70067" ...
##  $ stop_name : chr  "Alewife" "Davis - Inbound" "Porter - Inbound" "Harvard - Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "Red" "Red" "Red" "Red" ...
## red_line_alert : Classes 'tbl_df' and 'data.frame':  176 obs. of  8 variables:
##  $ text                  : chr  "Mattapan Trolley experiencing moderate delays due to a disabled train at Central Ave #mbta" "#RedLine experiencing minor northbound delays between North Quincy and JFK/UMass due to a signal problem. #mbta" "Shuttle buses replacing #RedLine service North Quincy to JFK due to disabled work equipment. Customers may utilize the Commuter"| __truncated__ "#RedLine experiencing moderate residual delays due to earlier disabled work equipment. #mbta" ...
##  $ created               : POSIXct, format: "2016-01-25 23:36:17" "2016-01-26 11:46:33" ...
##  $ favoriteCount         : int  0 0 1 0 0 0 0 1 0 0 ...
##  $ retweetCount          : int  0 0 2 0 0 0 0 0 0 0 ...
##  $ arr_dt                : POSIXct, format: "2016-01-25 23:36:17" "2016-01-26 11:46:33" ...
##  $ bounded               : chr  "" "northbound" "" "" ...
##  $ severity              : num  2 0 0 0 2 2 1 2 0 2 ...
##  $ alerts_at_station_code: chr  "70069_Central - Inbound, 70070_Central - Outbound" "" "" "" ...
## redalerts : Classes 'tbl_df', 'tbl' and 'data.frame':    81 obs. of  4 variables:
##  $ arr_dt   : POSIXct, format: "2016-01-25 23:36:17" "2016-01-28 23:00:48" ...
##  $ severity : num  2 2 2 1 2 0 2 0 0 2 ...
##  $ line     : chr  "Red" "Red" "Red" "Red" ...
##  $ stop_code: int  70069 70061 70073 70073 70077 70083 70101 70088 70066 70069 ...
## RedAshmont : 'data.frame':   17 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70061" "70063" "70065" "70067" ...
##  $ stop_name : chr  "Alewife" "Davis - Inbound" "Porter - Inbound" "Harvard - Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "RedAshmont" "RedAshmont" "RedAshmont" "RedAshmont" ...
## redAshmontline : 'data.frame':   17 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70061" "70063" "70065" "70067" ...
##  $ stop_name : chr  "Alewife" "Davis " "Porter " "Harvard " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "RedAshmont" "RedAshmont" "RedAshmont" "RedAshmont" ...
## RedBraintree : 'data.frame': 18 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70061" "70063" "70065" "70067" ...
##  $ stop_name : chr  "Alewife" "Davis - Inbound" "Porter - Inbound" "Harvard - Inbound" ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "RedBraintree" "RedBraintree" "RedBraintree" "RedBraintree" ...
## redBraintreeline : 'data.frame': 18 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70061" "70063" "70065" "70067" ...
##  $ stop_name : chr  "Alewife" "Davis " "Porter " "Harvard " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "RedBraintree" "RedBraintree" "RedBraintree" "RedBraintree" ...
## redline : 'data.frame':  23 obs. of  6 variables:
##  $ stop_order: chr  "1" "10" "20" "30" ...
##  $ stop_id   : chr  "70061" "70063" "70065" "70067" ...
##  $ stop_name : chr  "Alewife" "Davis " "Porter " "Harvard " ...
##  $ stop_lat  : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon  : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
##  $ line      : chr  "Red" "Red" "Red" "Red" ...
## RedLineRoute : 'data.frame': 2 obs. of  3 variables:
##  $ direction_id  : chr  "0" "1"
##  $ direction_name: chr  "Southbound" "Northbound"
##  $ stop          :List of 2
## regexp :  chr "[[:digit:]]+"
## schedule_dataset : Classes 'tbl_df', 'tbl' and 'data.frame': 1101208 obs. of  10 variables:
##  $ trip_id       : chr  "29533175" "29533175" "29533175" "29533175" ...
##  $ arrival_time  : chr  "06:40:00" "06:42:00" "06:44:00" "06:47:00" ...
##  $ departure_time: chr  "06:40:00" "06:42:00" "06:44:00" "06:47:00" ...
##  $ stop_id       : chr  "70094" "70092" "70090" "70088" ...
##  $ stop_sequence : int  50 60 70 80 90 110 120 130 140 150 ...
##  $ stop_headsign : logi  NA NA NA NA NA NA ...
##  $ pickup_type   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ drop_off_type : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ arrival_date  : Date, format: "2016-01-25" "2016-01-25" ...
##  $ departure_date: Date, format: "2016-01-25" "2016-01-25" ...
## seconds_of_day : 'data.frame':   27 obs. of  3 variables:
##  $ interval_start_hour: num  0 1 2 3 4 5 6 7 8 9 ...
##  $ hour_start         : num  0 3600 7200 10800 14400 18000 21600 25200 28800 32400 ...
##  $ hour_end           : num  3600 7200 10800 14400 18000 21600 25200 28800 32400 36000 ...
## servicesAdded : Classes 'tbl_df', 'tbl' and 'data.frame':    0 obs. of  1 variable:
##  $ service_id: chr 
## servicesRemoved :  chr [1:3] "RTL22016-hmb26011-Weekday-01" ...
## south_a : 'data.frame':  5 obs. of  2 variables:
##  $ stop_id  : Factor w/ 5 levels "70083","70085",..: 1 2 3 4 5
##  $ next_stop: Factor w/ 5 levels "70085","70087",..: 1 2 3 4 5
## south_b : 'data.frame':  6 obs. of  2 variables:
##  $ stop_id  : Factor w/ 6 levels "70083","70095",..: 1 2 3 4 5 6
##  $ next_stop: Factor w/ 6 levels "70095","70097",..: 1 2 3 4 5 6
## south_main : 'data.frame':   11 obs. of  2 variables:
##  $ stop_id  : Factor w/ 11 levels "70061","70063",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ next_stop: Factor w/ 11 levels "70063","70065",..: 1 2 3 4 5 6 7 8 9 10 ...
## startTime :  POSIXct[1:1], format: "2016-01-25 04:00:00"
## stop_ids :  chr [1:44] "70063" "70065" "70067" "70069" "70071" ...
## stop_names_codes : 'data.frame': 43 obs. of  1 variable:
##  $ stop_code_name: chr  "70061_Alewife" "70063_Davis - Inbound" "70064_Davis - Outbound" "70065_Porter - Inbound" ...
## stop_sequence : 'data.frame':    46 obs. of  5 variables:
##  $ stop_id            : chr  "70061" "70063" "70065" "70067" ...
##  $ stop_name          : chr  "Alewife" "Davis - Inbound" "Porter - Inbound" "Harvard - Inbound" ...
##  $ parent_station_name: chr  "Alewife" "Davis" "Porter" "Harvard" ...
##  $ heading            : chr  "Southbound" "Southbound" "Southbound" "Southbound" ...
##  $ stop_seq           : int  1 2 3 4 5 6 7 8 9 10 ...
## t.lub :  POSIXct[1:441346], format: "2016-01-25 05:20:13" "2016-01-25 05:22:26" ...
## Tarchive_cal : Classes 'tbl_df', 'tbl' and 'data.frame': 73 obs. of  11 variables:
##  $ service_id: chr  "RTL12016-hmb16wa6-Saturday-01" "RTL12016-hmb16011-Weekday-01" "RTL12016-hmb16no1-Weekday-01" "RTL12016-hmb16no6-Saturday-01" ...
##  $ monday    : int  0 1 1 0 0 1 0 0 1 1 ...
##  $ tuesday   : int  0 1 1 0 0 0 0 0 1 1 ...
##  $ wednesday : int  0 1 1 0 0 0 0 0 1 1 ...
##  $ thursday  : int  0 1 1 0 0 0 0 0 1 1 ...
##  $ friday    : int  0 1 1 0 0 0 0 0 1 1 ...
##  $ saturday  : int  1 0 0 1 0 0 1 0 0 0 ...
##  $ sunday    : int  0 0 0 0 1 0 0 1 0 0 ...
##  $ start_date: Date, format: "2016-02-15" "2016-02-16" ...
##  $ end_date  : Date, format: "2016-02-15" "2016-03-17" ...
##  $ zip_file  : chr  "MBTA_GTFS" "MBTA_GTFS" "MBTA_GTFS" "MBTA_GTFS" ...
## Tarchive_calDates : Classes 'tbl_df', 'tbl' and 'data.frame':    282 obs. of  3 variables:
##  $ service_id    : chr  "RTL12016-hmb16wa6-Saturday-01" "RTL12016-hmb16011-Weekday-01" "RTL12016-hmb16011-Weekday-01" "RTL12016-hmb16011-Weekday-01" ...
##  $ date          : Date, format: "2016-02-15" "2016-02-19" ...
##  $ exception_type: int  1 2 2 2 2 2 2 2 2 2 ...
## Tarchive_stops : 'data.frame':   8379 obs. of  11 variables:
##  $ stop_id            : Factor w/ 8379 levels "1","10","10000",..: 8204 8205 8206 8207 8208 8209 8210 8211 8212 8213 ...
##  $ stop_code          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ stop_name          : Factor w/ 7470 levels " Franklin St @ Washington St",..: 761 768 803 745 817 822 848 854 883 886 ...
##  $ stop_desc          : logi  NA NA NA NA NA NA ...
##  $ stop_lat           : num  42.4 42.3 42.3 42.4 42.4 ...
##  $ stop_lon           : num  -71.1 -71.1 -71.1 -71 -71.1 ...
##  $ zone_id            : logi  NA NA NA NA NA NA ...
##  $ stop_url           : logi  NA NA NA NA NA NA ...
##  $ location_type      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ parent_station     : Factor w/ 124 levels "","place-alfcl",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ wheelchair_boarding: int  1 2 1 1 1 1 1 1 2 1 ...
## Tarchive_stopTimes : Classes 'tbl_df', 'tbl' and 'data.frame':   244499 obs. of  8 variables:
##  $ trip_id       : chr  "29176129" "29176129" "29176129" "29176129" ...
##  $ arrival_time  : chr  "19:25:00" "19:26:00" "19:28:00" "19:30:00" ...
##  $ departure_time: chr  "19:25:00" "19:26:00" "19:28:00" "19:30:00" ...
##  $ stop_id       : chr  "70001" "70003" "70005" "70007" ...
##  $ stop_sequence : int  1 10 20 30 40 50 60 70 80 90 ...
##  $ stop_headsign : logi  NA NA NA NA NA NA ...
##  $ pickup_type   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ drop_off_type : int  1 0 0 0 0 0 0 0 0 0 ...
## Tarchive_trips : Classes 'tbl_df', 'tbl' and 'data.frame':   22636 obs. of  9 variables:
##  $ route_id             : Factor w/ 216 levels "1","10","100",..: 215 215 215 215 215 215 215 215 215 215 ...
##  $ service_id           : chr  "RTL22016-hmo26bk1-Weekday-01" "RTL22016-hmo26bk1-Weekday-01" "RTL22016-hmo26bk1-Weekday-01" "RTL22016-hmo26bk1-Weekday-01" ...
##  $ trip_id              : chr  "30745604" "30745615" "30745614" "30745613" ...
##  $ trip_headsign        : chr  "Oak Grove" "Oak Grove" "Oak Grove" "Oak Grove" ...
##  $ trip_short_name      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ direction_id         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ block_id             : chr  "O903_-25" "O903_-5" "O903_-25" "O903_-21" ...
##  $ shape_id             : chr  "903_0017" "903_0017" "903_0017" "903_0017" ...
##  $ wheelchair_accessible: int  1 1 1 1 1 1 1 1 1 1 ...
## TArchiveURLs :  chr [1:2] "http://www.mbta.com/uploadedfiles/MBTA_GTFS.zip" ...
## TFormat :  chr "&format=json"
## THeadwaysURL :  chr "http://realtime.mbta.com/developer/api/v2.1/headways"
## TKeyJeff :  chr "?api_key=tNprgYF1K027zEIVluAkCw"
## todaysServices : Classes 'tbl_df', 'tbl' and 'data.frame':   3 obs. of  1 variable:
##  $ service_id: chr  "RTL22016-hmb26011-Weekday-01" "RTL22016-hmo26011-Weekday-01" "RTL22016-hms26011-Weekday-01"
## todaysStops : Classes 'tbl_df', 'tbl' and 'data.frame':  7702 obs. of  10 variables:
##  $ trip_id       : chr  "30453590" "30453590" "30453590" "30453590" ...
##  $ arrival_time  : chr  "06:40:00" "06:42:00" "06:44:00" "06:47:00" ...
##  $ departure_time: chr  "06:40:00" "06:42:00" "06:44:00" "06:47:00" ...
##  $ stop_id       : chr  "70094" "70092" "70090" "70088" ...
##  $ stop_sequence : int  50 60 70 80 90 110 120 130 140 150 ...
##  $ stop_headsign : logi  NA NA NA NA NA NA ...
##  $ pickup_type   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ drop_off_type : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ arrival_date  : Date, format: "2016-05-04" "2016-05-04" ...
##  $ departure_date: Date, format: "2016-05-04" "2016-05-04" ...
## todaysTrips : Classes 'tbl_df', 'tbl' and 'data.frame':  440 obs. of  9 variables:
##  $ route_id             : Factor w/ 216 levels "1","10","100",..: 216 216 216 216 216 216 216 216 216 216 ...
##  $ service_id           : chr  "RTL22016-hms26011-Weekday-01" "RTL22016-hms26011-Weekday-01" "RTL22016-hms26011-Weekday-01" "RTL22016-hms26011-Weekday-01" ...
##  $ trip_id              : chr  "30453879" "30453881" "30453716" "30453715" ...
##  $ trip_headsign        : chr  "Alewife" "Alewife" "Alewife" "Alewife" ...
##  $ trip_short_name      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ direction_id         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ block_id             : chr  "S931_-27" "S931_-34" "S931_-31" "S931_-33" ...
##  $ shape_id             : chr  "931_0010" "931_0010" "931_0010" "931_0010" ...
##  $ wheelchair_accessible: int  1 1 1 1 1 1 1 1 1 1 ...
## total_lateness : Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 1544 obs. of  8 variables:
##  $ interval_start_hour: num  5 5 5 5 5 5 5 5 5 5 ...
##  $ stop_id            : chr  "70063" "70063" "70065" "70065" ...
##  $ stop_seq           : int  2 2 3 3 21 21 4 4 20 20 ...
##  $ parent_station_name: chr  "Davis" "Davis" "Porter" "Porter" ...
##  $ heading            : chr  "Southbound" "Southbound" "Southbound" "Southbound" ...
##  $ weekend            : chr  "Weekday" "Weekend" "Weekday" "Weekend" ...
##  $ total_lateness     : num  29817 4102 31928 4230 2653 ...
##  $ seconds_observed   : num  252000 100800 252000 100800 252000 ...
## travel_times : 'data.frame': 441180 obs. of  19 variables:
##  $ direction                : chr  "0" "0" "0" "0" ...
##  $ dep_dt                   : POSIXct, format: "2016-01-25 05:20:13" "2016-01-25 05:22:26" ...
##  $ arr_dt                   : POSIXct, format: "2016-01-25 05:21:23" "2016-01-25 05:23:59" ...
##  $ travel_time_sec          : int  70 93 147 102 71 295 115 76 134 119 ...
##  $ benchmark_travel_time_sec: int  120 180 180 120 120 120 180 120 180 180 ...
##  $ from_stop                : chr  "70063" "70065" "70067" "70069" ...
##  $ to_stop                  : chr  "70065" "70067" "70069" "70071" ...
##  $ dep_d                    : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ dep_t                    : chr  "05:20:13" "05:22:26" "05:24:54" "05:28:11" ...
##  $ arr_d                    : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ arr_t                    : chr  "05:21:23" "05:23:59" "05:27:21" "05:29:53" ...
##  $ time_delta               : int  -50 -87 -33 -18 -49 175 -65 -44 -46 -61 ...
##  $ dt                       : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ is_weekend               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ dep_time                 :Class 'difftime'  atomic [1:441180] 5.34 5.37 5.42 5.47 5.5 ...
##  $ stop_name                : chr  "Davis - Inbound" "Porter - Inbound" "Harvard - Inbound" "Central - Inbound" ...
##  $ parent_station_name      : chr  "Davis" "Porter" "Harvard" "Central" ...
##  $ heading                  : chr  "Southbound" "Southbound" "Southbound" "Southbound" ...
##  $ stop_seq                 : int  2 3 4 5 2 6 3 2 4 3 ...
## travel_times_alerts : 'data.frame':  441346 obs. of  23 variables:
##  $ direction                : chr  "0" "0" "0" "0" ...
##  $ dep_dt                   : POSIXct, format: "2016-01-25 05:20:13" "2016-01-25 05:22:26" ...
##  $ arr_dt                   : POSIXct, format: "2016-01-25 05:21:23" "2016-01-25 05:23:59" ...
##  $ travel_time_sec          : int  70 93 147 102 71 295 115 76 134 119 ...
##  $ benchmark_travel_time_sec: int  120 180 180 120 120 120 180 120 180 180 ...
##  $ from_stop                : int  70063 70065 70067 70069 70063 70071 70065 70063 70067 70065 ...
##  $ to_stop                  : int  70065 70067 70069 70071 70065 70073 70067 70065 70069 70067 ...
##  $ dep_d                    : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ dep_t                    : chr  "05:20:13" "05:22:26" "05:24:54" "05:28:11" ...
##  $ arr_d                    : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ arr_t                    : chr  "05:21:23" "05:23:59" "05:27:21" "05:29:53" ...
##  $ time_delta               : int  -50 -87 -33 -18 -49 175 -65 -44 -46 -61 ...
##  $ dt                       : POSIXct, format: "2016-01-25" "2016-01-25" ...
##  $ is_weekend               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ dep_time                 :Class 'difftime'  atomic [1:441346] 5.34 5.37 5.42 5.47 5.5 ...
##  $ stop_name                : chr  "Davis - Inbound" "Porter - Inbound" "Harvard - Inbound" "Central - Inbound" ...
##  $ parent_station_name      : chr  "Davis" "Porter" "Harvard" "Central" ...
##  $ heading                  : chr  "Southbound" "Southbound" "Southbound" "Southbound" ...
##  $ stop_seq                 : int  2 3 4 5 2 6 3 2 4 3 ...
##  $ text                     : chr  NA NA NA NA ...
##  $ created                  : POSIXct, format: NA NA ...
##  $ favoriteCount            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ retweetCount             : int  NA NA NA NA NA NA NA NA NA NA ...
## travel_times_alerts2 : 'data.frame': 176 obs. of  27 variables:
##  $ direction                : chr  NA NA NA NA ...
##  $ dep_dt                   : POSIXct, format: NA NA ...
##  $ arr_dt                   : POSIXct, format: "2016-01-25 23:36:17" "2016-01-26 11:46:33" ...
##  $ travel_time_sec          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ benchmark_travel_time_sec: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ from_stop                : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ to_stop                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ dep_d                    : POSIXct, format: NA NA ...
##  $ dep_t                    : chr  NA NA NA NA ...
##  $ arr_d                    : POSIXct, format: NA NA ...
##  $ arr_t                    : chr  NA NA NA NA ...
##  $ time_delta               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ dt                       : POSIXct, format: NA NA ...
##  $ is_weekend               : logi  NA NA NA NA NA NA ...
##  $ dep_time                 :Class 'difftime'  atomic [1:176] NA NA NA NA NA NA NA NA NA NA ...
##  $ stop_name                : chr  NA NA NA NA ...
##  $ parent_station_name      : chr  NA NA NA NA ...
##  $ heading                  : chr  NA NA NA NA ...
##  $ stop_seq                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ text                     : chr  "Mattapan Trolley experiencing moderate delays due to a disabled train at Central Ave #mbta" "#RedLine experiencing minor northbound delays between North Quincy and JFK/UMass due to a signal problem. #mbta" "Shuttle buses replacing #RedLine service North Quincy to JFK due to disabled work equipment. Customers may utilize the Commuter"| __truncated__ "#RedLine experiencing moderate residual delays due to earlier disabled work equipment. #mbta" ...
##  $ created                  : POSIXct, format: "2016-01-25 23:36:17" "2016-01-26 11:46:33" ...
##  $ favoriteCount            : int  0 0 1 0 0 0 0 1 0 0 ...
##  $ retweetCount             : int  0 0 2 0 0 0 0 0 0 0 ...
##  $ bounded                  : chr  "" "northbound" "" "" ...
##  $ severity                 : num  2 0 0 0 2 2 1 2 0 2 ...
##  $ alerts_at_station_code   : chr  "70069_Central - Inbound, 70070_Central - Outbound" "" "" "" ...
##  $ stop_code                : int  70069 NA NA NA 70061 70073 70073 70077 70083 70101 ...
## travel_times_alerts3 : 'data.frame': 81 obs. of  7 variables:
##  $ dep_dt     : POSIXct, format: NA NA ...
##  $ arr_dt     : POSIXct, format: "2016-01-25 23:36:17" "2016-01-28 23:00:48" ...
##  $ severity   : num  2 2 2 1 2 0 2 0 0 2 ...
##  $ stop_code  : int  70069 70061 70073 70073 70077 70083 70101 70088 70066 70069 ...
##  $ stop_name.y: Factor w/ 7470 levels " Franklin St @ Washington St",..: 1802 761 1985 1985 2504 805 5556 5978 5487 1802 ...
##  $ stop_lat   : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ stop_lon   : num  -71.1 -71.1 -71.1 -71.1 -71.1 ...
## traveltime : Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 42 obs. of  3 variables:
##  $ start_stop    : chr  "Alewife" "Andrew - Inbound" "Andrew - Outbound" "Andrew - Outbound" ...
##  $ end_stop      : Factor w/ 7470 levels " Franklin St @ Washington St",..: 2376 1361 3605 3607 6058 5519 6160 805 3610 3209 ...
##  $ avg_traveltime: num  112.3 122.3 101.6 102.5 72.9 ...
## traveltimes_day : function (fromstation, tostation)  
## TRouteURL :  chr "http://realtime.mbta.com/developer/api/v2/stopsbyroute"
## TTravelURL :  chr "http://realtime.mbta.com/developer/api/v2.1/traveltimes"
## weather : 'data.frame':  6 obs. of  1 variable:
##  $ X..DOCTYPE.HTML.PUBLIC....IETF..DTD.HTML.2.0..EN.: Factor w/ 6 levels "</head><body>",..: 3 6 1 2 4 5
## weekday_southbound : Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 353 obs. of  3 variables:
##  $ parent_station_name: chr  "Andrew" "Andrew" "Andrew" "Andrew" ...
##  $ interval_start_hour: num  5 6 7 8 9 10 11 12 13 14 ...
##  $ percent_late       : num  0.0379 0.2703 0.363 0.3008 0.3128 ...